verneuil
go-ds-crdt
Our great sponsors
verneuil | go-ds-crdt | |
---|---|---|
5 | 7 | |
392 | 357 | |
1.5% | 1.7% | |
6.7 | 6.1 | |
2 months ago | 3 months ago | |
C | Go | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
verneuil
- Show HN: Query SQLite files stored in S3
-
Embedded database with VFS support?
It'd be process wide. If you want an example can check out the example using a vfs here. There's an explicit passing of vfs there and an implicit usage of it. https://github.com/backtrace-labs/verneuil/blob/main/examples/rusqlite_integration.rs
- LiteFS a FUSE-based file system for replicating SQLite
-
A database for 2022 ยท Tailscale
It doesn't even have to be WAL-based system. Backtrace Labs has a SQLite virtual file system (VFS) called Verneuil that works similarly but works with the rollback journal instead of the WAL.
-
Ask HN: P2P Databases?
https://github.com/backtrace-labs/verneuil/ is one way to address the diffing / read replica part of the problem. I believe it's compatible with gossipping: most of the data is in small content-addressed chunks, with small manifests that tell clients what chunks to fetch and how to reassemble them to recreate a sqlite database. There's already client-side caching to persistent storage, and chunks can be fetched on demand.
Sharing replication data P2P, while retaining the simplicity of a single authoritative writer per database, is explicitly part of the project's long-term goals!
go-ds-crdt
-
CRDTs Turned Inside Out
I forgot: key-value store using MD-CRDTs was implemented here: https://github.com/ipfs/go-ds-crdt
The trickiest part was not the CRDT, but the DAG traversal with multiple workers processing parallel updates on multiple branches and switching CRDT-DAG roots as they finish branches.
-
We Put IPFS in Brave
In https://github.com/ipfs/go-ds-crdt, every node in the Merkle DAG has a "Priority" field. When adding a new head, this is set to (maximum of the priorities of the children)+1.
Thus, this priority represents the current depth (or height) of the DAG at each node. It is sort of a timestamp and you could use a timestamp, or whatever helps you sort. In the case of concurrent writes, the write with highest priority wins. If we have concurrent writes of same priority, then things are sorted by CID.
The idea here is that in general, a node that is lagging behind or not syncing would have a dag with less depth, therefore its writes would have less priority when they conflict with writes from others that have built deeper DAGs. But this is after all an implementation choice, and the fact that a DAG is deeper does not mean that the last write on a key happened "later".
-
Making CRDTs Byzantine Fault Tolerant [pdf]
The idea of DAG-embedded CRDTs is far from new and was introduced here:
https://arxiv.org/abs/2004.00107 (I'm among the authors)
Unfortunately, the verification that the author proposes (not accepting new updates until the dag below is verified) will need a lot of caveats for real world usage.
Currently we use these CRDTs for a key value database of 40M+ keys in a deployment of ipfs-cluster, which uses https://github.com/ipfs/go-ds-crdt .
- Ask HN: P2P Databases?
- Go-ds-CRDT: distributed datastore using Merkle-CRDTs
- Conflict-free replicated datatypes solve distributed data consistency challenges
-
Data Laced with History: Causal Trees and Operational CRDTs (2018)
Not 100% the thing, but potentially related work in this area:
https://github.com/ipfs/go-ds-crdt
(See link to paper, and links to other projects in it, like OrbitDB).
What are some alternatives?
litefs - FUSE-based file system for replicating SQLite databases across a cluster of machines
merkle-crdt - Merkle-Clock CRDT implementation in python
dqlite - Embeddable, replicated and fault-tolerant SQL engine.
differential-dataflow - An implementation of differential dataflow using timely dataflow on Rust.
WCDB - WCDB is a cross-platform database framework developed by WeChat.
yjs - Shared data types for building collaborative software
bb-remote-execution - Tools for Buildbarn to allow remote execution of build actions
Apache Ignite - Apache Ignite
s3fs - S3 Filesystem
yata - YATA based algorithm for plain text CRDT edit merging in python
s3sqlite - Query SQLite files in S3 using s3fs
crdt-study - A Python study of distributed, conflict-free Last-Writer-Wins (LWW) undirected graphs