-
braft
An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Haha, I totally hear you. But but, we didn't really build the raft consensus layer from scratch. We used an existing robust library for that: https://github.com/baidu/braft
-
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
> And so, if your process crashes and restarts, it first reloads the snapshot, and replays the transaction logs to fully recover the state. (Notice that index changes don’t need to be part of the transaction log. For instance if there’s an index on field bar from Foo, then setBar should just update the index, which will get updated whether it’s read from a snapshot, or from a transaction.)
That’s a database. You even linked to the specific database you’re using [0], which describes itself as:
> […] in-memory database with transactions […]
Am I misunderstanding something?
[0]: https://github.com/bknr-datastore/bknr-datastore
-
> Imagine all the wonderful things you could build if you never had to serialize data into SQL queries.
This exists in sufficiently mature Actor model[0] implementations, such as Akka Event Sourcing[1], which also addresses:
> But then comes the important part: how do you recover when your process crashes? It turns out that answer is easy, periodically just take a snapshot of everything in RAM.
Intrinsically and without having to create "a new architecture for web development". There are even open source efforts which explore the RAFT protocol using actors here[2] and here[3].
0 - https://en.wikipedia.org/wiki/History_of_the_Actor_model
1 - https://doc.akka.io/docs/akka/current/typed/persistence.html
2 - https://github.com/Michael-Dratch/RAFT_Implementation
3 - https://github.com/invkrh/akka-raft
-
> Imagine all the wonderful things you could build if you never had to serialize data into SQL queries.
This exists in sufficiently mature Actor model[0] implementations, such as Akka Event Sourcing[1], which also addresses:
> But then comes the important part: how do you recover when your process crashes? It turns out that answer is easy, periodically just take a snapshot of everything in RAM.
Intrinsically and without having to create "a new architecture for web development". There are even open source efforts which explore the RAFT protocol using actors here[2] and here[3].
0 - https://en.wikipedia.org/wiki/History_of_the_Actor_model
1 - https://doc.akka.io/docs/akka/current/typed/persistence.html
2 - https://github.com/Michael-Dratch/RAFT_Implementation
3 - https://github.com/invkrh/akka-raft
-
It just persists its in-memory data structures to disk. Here's the source of an old version; note uses of `diskvar` and `disktable`. A "table" here is just a hashtable.
https://github.com/wting/hackernews/blob/master/news.arc
-
>You could just use sqlite with :memory: for the Raft FSM
That's the basic design that rqlite[1] had for its first ~7 years. :-) But rqlite moved to on-disk SQLite, since with WAL mode, and with 'PRAGMA synchronous=OFF', it is about as fast as writing to RAM. Or at least close enough, and I avoid all the limitations that come with :memory: SQLite databases (max size of 2GB being one). I should have just used on-disk mode from the start, but only now know better.
(I'm guessing you may know some of this because rqlite uses the same Raft library [2] as Nomad.)
As for the upgrade issue you mention, yes, it's real. Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit in the field with rqlite. Of course, one way to deal with it is to release a version of one's software first that *understands the new types" but doesn't ever write the new types. And once that version is fully deployed, upgrade to the version that actually writes new types too. I've never bothered to do this in practise, and it requires discipline on the part of the end-users too.
[1] https://www.rqlite.io
[2] https://github.com/hashicorp/raft
-
Nomad
Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
> I should have just used on-disk mode from the start, but only now know better.
Yeah, I saw the recent post about reducing rqlite disk space usage. Using the on-disk sqlite as both the FSM and the Raft snapshot makes a lot of sense here. I'm curious whether you've had concerns about write amplification though? Because we have only the periodic Raft snapshots and the FSM is in-memory, during high write volumes we're only really hammering disk with the Raft logs.
> Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite.
My understanding is that rqlite Raft entries are mostly SQL statements (is that right?). Where Nomad is somewhat different (and probably closer to the OP) is that the Raft entries are application-level entries. For entries that are commands like "stop this job"[0] upgrades are simple.
The tricky entries are where the entry is "upsert this large deeply-nested object that I've serialized", like the Job or Node (where the workloads run). The typical bug here is you've added a field way down in the guts of one of these objects that's a pointer to a new struct. When old versions deserialize the message they ignore the new field and that's easy to reason about. But if the leader is still on an old version and the new code deserializes the old object, you need to make sure you're not missing any nil pointer checks. Without sum types enforced at compile time (i.e. Option/Maybe), we have to catch all these via code review and a lot of tedious upgrade testing.
> it requires discipline on the part of the end-users too.
Oh for sure. Nomad runs into some commercial realities here around how much discipline we can demand from end-users. =)
[0] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Related posts
-
MicroRaft: Feature-complete implementation of the Raft algorithm in Java
-
MicroRaft: Feature-complete implementation of the Raft consensus algorithm in Java
-
Building Aggregates in Elixir and PostgreSQL
-
Etcd: A Distributed, Reliable Key-Value Store for Critical System Data
-
How to Build Your Own Distributed KV Storage System Using the etcd Raft Library (2)