Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems. Learn more →
Top 23 Distributed System Open-Source Projects
-
advanced-java
😮 Core Interview Questions & Answers For Experienced Java(Backend) Developers | 互联网 Java 工程师进阶知识完全扫盲:涵盖高并发、分布式、高可用、微服务、海量数据处理等领域知识
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
Project mention: The Patterns of Scalable, Reliable, and Performant Large-Scale Systems | news.ycombinator.com | 2024-12-19
-
Project mention: Securing Kubernetes: Encrypting Data at Rest with kubeadm and containerd on Amazon Linux 2023 | dev.to | 2025-04-15
curl -LO https://github.com/etcd-io/etcd/releases/download/v3.5.21/etcd-v3.5.21-linux-amd64.tar.gz tar xzf etcd-v3.5.21-linux-amd64.tar.gz
-
Let's look at the example from Apache Dubbo:
-
"System Design" by Karan Pratap Singh: How to design systems at scale and prepare for system design interviews. Link
-
spacedrive
Spacedrive is an open source cross-platform file explorer, powered by a virtual distributed filesystem written in Rust.
-
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Boosting Boosting is not a separate ML model but a technique that combines multiple weak learners to create a single model that can generate highly accurate predictions. Xgboost is a common boosting model that supports distributed training, resulting in faster training. According to research by Intel, Xgboost can be more effective than a neural network-based approach for tabular data. In addition, Xgboost is faster to train and doesn’t require as much data as neural networks need.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
https://nsq.io/ is also very reliable, stable, lightweight, and easy to use.
-
seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
I’m interested in how it is compared to seaweedfs[1], which we use for storing weather data (about 3 PB) for ML training.
[1] https://github.com/seaweedfs/seaweedfs
-
awesome-system-design-resources
Learn System Design concepts and prepare for interviews using free resources.
Project mention: 🔥 17 Best Free GitHub Repositories to Crack System Design Interviews 🛠️ | dev.to | 2024-12-0611. Awesome System Design Resources
-
There's no Remote Procedure Call built into the protocol. JsonRPC is also not RPC in itself.
It's like GraphQL with resolvers.
They have you imagine it's a procedure, but you can ignore that.
Here's the golang gRPC Hello World where the equivalent of a resolver in GraphQL replies directly w/o need for a procedure by that name. https://github.com/grpc/grpc-go/blob/master/examples/hellowo...
-
conductor
Conductor is an event driven orchestration platform providing durable and highly resilient execution engine for your applications
Project mention: Netflix has open-sourced its Maestro Workflow Orchestrator | news.ycombinator.com | 2024-07-22I'm a bit confused about what is going on here: This project appears to use Netflix/conductor [0]. But you go to that repo, you see it has been archived, with a message saying it is replaced by Netflix's internal non-OSS version, and by unmentioned community forks – by which I assume they mean Orkes Conductor [1]. But this isn't using Orkes Conductor, it looks like it is using the discontinued Netflix version `com.netflix.conductor:conductor-core:2.31.5` [2] – and an outdated version of it too.
[0] https://github.com/Netflix/conductor
[1] https://github.com/conductor-oss/conductor
[2] https://github.com/Netflix/maestro/blob/e8bee3f1625d3f31d84d...
-
Project mention: CNCF tells main NATS contributor Synadia that it's free to fork off | news.ycombinator.com | 2025-04-29
[1] https://github.com/nats-io/nats-server/issues/6832#issuecomm...
-
Project mention: The definitive guide to using Django with SQLite in production 💡 | dev.to | 2025-01-18
rqlite: The lightweight, user-friendly, distributed relational database built on SQLite
-
system-design
A resource to help busy software engineers become good at system design 👇 (by systemdesign42)
“System Design Newsletter” by Neo Kim
-
Nomad
Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
20k+ nodes and 200k+ allocs. To be fair, Kubernetes cannot support this large of a cluster.
Most of my issues with it aren't related to the scale though. I wasn't involved in the operations of the cluster, I was just a user of Nomad trying to run a few thousand stateful allocs. Without custom resources and custom controllers, managing stateful services was a pain in the ass. Critical bugs would also often take years to get fixed. I had lots of fun getting paged in the middle of the night because 2 allocs would suddenly decide they now have the same index (https://github.com/hashicorp/nomad/issues/10727)
-
Project mention: Launch HN: Stack Auth (YC S24) – An Open-Source Auth0/Clerk Alternative | news.ycombinator.com | 2024-08-08
Just for clarification, So you can't really host this without open-sourcing my product (since your server is AGPL). Isn't it a stretch to call this really open-source? I compare this to something like a temporal which I can self-host without worrying (and which I believe is MIT license [https://github.com/temporalio/temporal/blob/main/LICENSE])
-
Akka
A platform to build and run apps that are elastic, agile, and resilient. SDK, libraries, and hosted environments.
-
Zookeeper is a distributed coordination service used in older versions of Kafka to manage cluster metadata, leader election, and configuration. It ensures consistency and synchronization across Kafka brokers.
-
Object Storage: JuiceFS, Minio
-
NebulaGraph Database
A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)
-
Traditional databases — PostgreSQL, MySQL, etc. — store their data in proprietary formats. That format is optimized for that engine and can’t be directly accessed by anything else. Even if something like Trino can connect to Postgres, it’s still running queries through Postgres itself, not reading its storage directly. You’re just a client.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Distributed Systems discussion
Distributed Systems related posts
-
Kronotop: Horizontally scalable, distributed, transactional document database
-
What If We Could Rebuild Kafka from Scratch?
-
Py4J: Enables Python programs to dynamically access arbitrary Java objects
-
My Learnings About Etcd
-
Invoice Processing With Autokitteh
-
Longhorn: Cloud native distributed block storage for Kubernetes
-
Building Stateful AI Research Agent with openai-agents and AutoKitteh
-
A note from our sponsor - InfluxDB
influxdata.com | 30 Apr 2025
Index
What are some of the best open-source Distributed System projects? This list will help you:
# | Project | Stars |
---|---|---|
1 | advanced-java | 77,500 |
2 | awesome-scalability | 61,670 |
3 | etcd | 49,173 |
4 | Dubbo | 40,923 |
5 | system-design | 35,296 |
6 | spacedrive | 34,330 |
7 | xgboost | 26,861 |
8 | nsq | 25,233 |
9 | seaweedfs | 24,210 |
10 | awesome-system-design-resources | 22,623 |
11 | grpc-go | 21,776 |
12 | conductor | 20,584 |
13 | NATS | 16,968 |
14 | rqlite | 16,521 |
15 | system-design | 15,417 |
16 | Nomad | 15,397 |
17 | temporal | 13,760 |
18 | Akka | 13,144 |
19 | Apache ZooKeeper | 12,458 |
20 | juicefs | 11,531 |
21 | NebulaGraph Database | 11,289 |
22 | Trino | 11,216 |
23 | awesome-distributed-systems | 11,018 |