Top 8 C++ distributed-database Projects

ClickHouse

208 34,054 10.0 C++

ClickHouse® is a free analytics DBMS for big data

Project mention: We Built a 19 PiB Logging Platform with ClickHouse and Saved Millions | news.ycombinator.com | 2024-04-02

Yes, we are working on it! :) Taking some of the learnings from current experimental JSON Object datatype, we are now working on what will become the production-ready implementation. Details here: https://github.com/ClickHouse/ClickHouse/issues/54864
Variant datatype is already available as experimental in 24.1, Dynamic datatype is WIP (PR almost ready), and JSON datatype is next up. Check out the latest comment on that issue with how the Dynamic datatype will work: https://github.com/ClickHouse/ClickHouse/issues/54864#issuec...

foundationdb

21 13,948 9.8 C++

FoundationDB - the open source, distributed, transactional key-value store

Project mention: Figma's Databases team lived to tell the scale | news.ycombinator.com | 2024-03-14

Actually, Apple does this for iCloud! They use FoundationDB[1] to store billions of databases, one for each user (plus shared or global databases).
See: https://read.engineerscodex.com/p/how-apple-built-icloud-to-...
Discussed on HN at the time: https://news.ycombinator.com/item?id=39028672
[1]: https://github.com/apple/foundationdb https://en.wikipedia.org/wiki/FoundationDB

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ArangoDB

17 13,340 9.9 C++

🥑 ArangoDB is a native multi-model database with flexible data models for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Project mention: Ask HN: When is pure functional programming beneficial? | news.ycombinator.com | 2023-07-11

... or working in an environment or on a problem for which functional patterns apply.
Suppose you are writing a "CRUD" app that writes to a relational database, how do you apply functional programming to that? The whole point of an application like that is that it makes side effects.
In some cases you can break those problems down into functional pieces. Consider Python drivers for a product like
https://www.arangodb.com/
One major problem is that you want drivers that work synchronously and asynchronously, the structure of the average api call is something like
   def query(parameters):

oceanbase

8 7,340 10.0 C++

OceanBase is an enterprise distributed relational database with high availability, high performance, horizontal scalability, and compatibility with SQL standards.

Project mention: Show HN: OceanBase – An open-source distributed SQL database written in C++ | news.ycombinator.com | 2023-05-23

ydb

10 3,398 10.0 C++

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions

Project mention: Erasure Coding versus Tail Latency | news.ycombinator.com | 2024-03-28

There https://ydb.tech/ open source db that uses erasure coding for replication in single zone/region.

incubator-pegasus

5 1,944 9.4 C++

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
ytsaurus

4 1,765 10.0 C++

YTsaurus is a scalable and fault-tolerant open-source big data platform.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ScaleStore

2 105 3.4 C++

This is the source code for our (Tobias Ziegler, Carsten Binnig and Viktor Leis) published paper at SIGMOD’22: ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA.

Project mention: Ask HN: Why are there no open source NVMe-native key value stores in 2023? | news.ycombinator.com | 2023-10-16

I don't remember exactly why I have any of them saved, but these are some experimental data stores that seems to be fitting what you're looking for somewhat:
- https://github.com/DataManagementLab/ScaleStore - "A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA"
- https://github.com/unum-cloud/udisk - "The fastest ACID-transactional persisted Key-Value store designed for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel."
- https://github.com/capsuleman/ssd-nvme-database - "Columnar database on SSD NVMe"

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).