Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
tigerbeetle
The distributed financial transactions database designed for mission critical safety and performance.
-
fastapi-raft
Python implementation of the Raft Distributed Consensus Algorithm with ASGI + Starlette + FastAPI
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I had a fun time recently implementing Raft leader election and log replication (i.e. I didn't get to snapshotting/checkpointing) recently. One of the most challenging projects I've tried to do.
I collected all the resources I found useful while doing it here: https://github.com/eatonphil/goraft#references. This includes Diego Ongaro's thesis and his TLA+ spec.
Some people say Figure 2 of the Raft paper has everything you need but I'm pretty sure that's just not true. It's a little bit more vague than looking at the TLA+ spec to me anyway.
Maelstrom [1], a workbench for learning distributed systems from the creator of Jepsen, includes a simple (model-checked) implementation of Raft and an excellent tutorial on implementing it.
Raft is a simple algorithm, but as others have noted, the original paper includes many correctness details often brushed over in toy implementations. Furthermore, the fallibility of real-world hardware (handling memory/disk corruption and grey failures), the requirements of real-world systems with tight latency SLAs, and a need for things like flexible quorum/dynamic cluster membership make implementing it for production a long and daunting task. The commit history of etcd and hashicorp/raft, likely the two most battle-tested open source implementations of raft that still surface correctness bugs on the regular tell you all you need to know.
The tigerbeetle team talks in detail about the real-world aspects of distributed systems on imperfect hardware/non-abstracted system models, and why they chose viewstamp replication, which predates Paxos but looks more like Raft.
[1]: https://github.com/jepsen-io/maelstrom/
[2]: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...
Maelstrom [1], a workbench for learning distributed systems from the creator of Jepsen, includes a simple (model-checked) implementation of Raft and an excellent tutorial on implementing it.
Raft is a simple algorithm, but as others have noted, the original paper includes many correctness details often brushed over in toy implementations. Furthermore, the fallibility of real-world hardware (handling memory/disk corruption and grey failures), the requirements of real-world systems with tight latency SLAs, and a need for things like flexible quorum/dynamic cluster membership make implementing it for production a long and daunting task. The commit history of etcd and hashicorp/raft, likely the two most battle-tested open source implementations of raft that still surface correctness bugs on the regular tell you all you need to know.
The tigerbeetle team talks in detail about the real-world aspects of distributed systems on imperfect hardware/non-abstracted system models, and why they chose viewstamp replication, which predates Paxos but looks more like Raft.
[1]: https://github.com/jepsen-io/maelstrom/
[2]: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...
I've written a whole SQLite replication system that works on top of RAFT ( https://github.com/maxpert/marmot ). Best part is RAFT has a well understood and strong library ecosystem as well. I started of with libraries and when I noticed I am reimplementing distributed streams, I just took off the shelf implementation (https://docs.nats.io/nats-concepts/jetstream) and embedded it in system. I love the simplicity and reasoning that comes with RAFT. However I am playing with epaxos these days (https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf), because then I can truly decentralize the implementation for truly masterless implementation. Right now I've added sharding mechanism on various streams so that in high load cases masters can be distributed across nodes too.
I had to implement Raft for a network programming course during my bachelors and I had the same experience regarding how gentle the paper was. Especially for people new to distributed algorithms, I can really recommend it.
My implementation is probably not that great, but I put it online anyway if anyone is interested: https://github.com/skowalak/fastapi-raft/