Turning SQLite into a Distributed Database

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • rqlite

    The lightweight, distributed relational database built on SQLite.

  • rook

    Storage Orchestration for Kubernetes

    So I just fell down the rabbit hole of figuring out how to use SQLite with Ceph (turns out a thing called libcephsqlite[0][1] exists) -- awesome to see this new take on distributed SQLite.

    The caveats for dqlite and rqlite always felt kind of awkward/risky to me -- in stark contrast to SQLite which is so stable/"built in" that you don't think about it's failure modes. Having to worry about what exactly I ran (ex. RANDOM()) was just a non-starter (IIRC rqlite has this problem but not dqlite? or the other way around -- one replicates at statement level the other at WAL level).

    That said though, the biggest sticking point with all this SQLite goodness is how to make sure that certain libraries (any popular extension -- vsv, spatialite, libcephsqlite) were loaded for any application using SQLite -- there seem to be only a few options:

    - calling load_extension[2] from code (this is somewhat frowned upon, but maybe it's fine)

    - LD_PRELOAD (mvsqlite does this)

    - Building your own SQLite and swapping out shared libs (mvqslite also does this, because statically compiled sqlite is a nuisance)

    - Trapping/catching calls to dlopen (also basically requires LD_PRELOAD, but I guess you could go custom kernel or whatever)

    This is probably the one big wart of SQLite -- it's a bit difficult to pull in new interesting extensions.

    [0]: https://docs.ceph.com/en/latest/rados/api/libcephsqlite/

    [1]: https://github.com/rook/rook/issues/10689

    [2]: https://www.sqlite.org/lang_corefunc.html#load_extension

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • Bedrock

    Rock solid distributed database specializing in active/active automatic failover and WAN replication (by Expensify)

    Don’t forget BedrockDB (built on SQLite) that’s used in production at Expensify.

    How it scales as well.

    https://bedrockdb.com/

    https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

  • fdb-document-layer

    A document data model on FoundationDB, implementing MongoDB® wire protocol

    This is exactly what the engineers behind FoundationDB (FDB) wanted when they open sourced. For those who don't know, FDB provides a transactional (and distributed) ordered key-value store with a somewhat simple but very powerful API.

    Their vision was to build the hardest parts of building a database, such as transactions, fault-tolerance, high-availability, elastic scaling, etc. This would free users to build higher-level APIs (Layers) APIs [1] / libraries [2] on top.

    The beauty of these layers is that you can basically remove doubt about about the correctness of data once it leaves the layer. FoundationDB is one of the most (if not the) most tested databases out there. I used it for over 4 years in high write / read production environments and never once did we second guess our decision.

    I could see this project renamed to simply "fdb-sqlite-layer"

    [1] https://github.com/FoundationDB/fdb-document-layer

  • mvsqlite

    Distributed, MVCC SQLite that runs on FoundationDB.

    Hi mrkurt!

    Litestream/LiteFS are amazing projects. The FUSE-based approach is interesting (I'm implementing something similar in mvSQLite, thanks for the idea!)

    > Graceful failure

    mvSQLite is designed to continue to operate under degraded network (there is a fault-injection test specifically for checking this property: https://github.com/losfair/mvsqlite/blob/1dd1a80d2ff7263b07a...). Network errors and service unavailability are handled with idempotent retries and not exposed to the application.

    > Good for caching

    mvSQLite caches pages read and written, and does differential cache invalidation (only remotely modified pages are invalidated in the local page cache). The local cache is just a regular KV store with invalidation strategies, and can be moved onto the disk. So it essentially becomes a consistent local database snapshot.

  • litefs

    FUSE-based file system for replicating SQLite databases across a cluster of machines

    (That one replaced SQLite's btree, this one puts pages of the btree as values in the key-value store.)

    Another approach using FUSE, making arbitrary SQLite-using applications leader-replica style distributed for HA: https://github.com/superfly/litefs (see also https://litestream.io/ for WAL-streaming backups, that's the foundation of this)

  • litestream

    Streaming replication for SQLite.

    (That one replaced SQLite's btree, this one puts pages of the btree as values in the key-value store.)

    Another approach using FUSE, making arbitrary SQLite-using applications leader-replica style distributed for HA: https://github.com/superfly/litefs (see also https://litestream.io/ for WAL-streaming backups, that's the foundation of this)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts