hashtable-benchmarks
tigerbeetle
hashtable-benchmarks | tigerbeetle | |
---|---|---|
8 | 45 | |
29 | 7,059 | |
- | 5.7% | |
4.7 | 9.9 | |
5 months ago | 7 days ago | |
Java | Zig | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hashtable-benchmarks
-
Building a faster hash table for high performance SQL joins
Since the blog post mentioned a PR to replace linear probing with Robin Hood, I just wanted to mention that I found bidirectional linear probing to outperform Robin Hood across the board in my Java integer set benchmarks:
https://github.com/senderista/hashtable-benchmarks/blob/mast...
https://github.com/senderista/hashtable-benchmarks/wiki/64-b...
-
Ask HN: Who wants to be hired? (December 2023)
https://homes.cs.washington.edu/~magda/papers/wang-cidr17.pd...
I'm most interested in developing high-performance database engines in low-level languages, but open to any challenging systems programming project. I've been working in C++ for the last 3 years, but have written nontrivial projects in Rust and Java as well (e.g., https://github.com/senderista/rotated-array-set, https://github.com/senderista/hashtable-benchmarks). I would enjoy using Rust or Zig on a new project, but I consider the project itself to be much more important than the language it's written in. I am not interested in cryptocurrency, adtech, or fintech projects.
-
Factor is faster than Zig
Thanks for the details on your benchmarks. I would like sometime to extend BLP to a more generic setting; as I said I think any trick used with RH would also work with BLP. I just used an integer set because that's all I needed for my use case and it was easy to implement several different approaches for benchmarking. As you note, it favors use cases where the hash function is cheap (or invertible) and elements are cheap to move around.
About your question on load factors: no, the benchmarks are measuring exactly what they claim to be. The hash table constructor divides max data size by load factor to get the table size (https://github.com/senderista/hashtable-benchmarks/blob/mast...), and the benchmark code instantiates each hash table for exactly the measured data set size and load factor (https://github.com/senderista/hashtable-benchmarks/blob/mast...).
I can't explain the peaks around 1M in many of the plots; I didn't investigate them at the time and I don't have time now. It could be a JVM artifact, but I did try to use JMH "best practices", and there's no dynamic memory allocation or GC happening during the benchmark at all. It would be interesting to port these tables to Rust and repeat the measurements with Criterion. For more informative graphs I might try a log-linear approach: divide the intervals between the logarithmically spaced data sizes into a fixed number of subintervals (say 4).
-
Inside boost::unordered_flat_map
I think "bidirectional linear probing" is an underrated approach (and much simpler): https://github.com/senderista/hashtable-benchmarks/blob/master/src/main/java/set/int64/BLPLongHashSet.java
-
A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
I will probably never get around to porting my bidirectional linear probing integer hash set from Java to C++, but I hope someone can try adapting BLP to general C++ hashmaps and hashsets, because it significantly outperforms Robin Hood in my benchmarks.
-
Ask HN: Who wants to be hired? (March 2022)
https://homes.cs.washington.edu/~magda/papers/wang-cidr17.pd...
I'm most interested in developing high-performance database engines in low-level languages, but open to any challenging systems programming project. I've been working in C++ for the last 2 years, but have written nontrivial projects in Rust and Java as well (e.g., https://github.com/senderista/rotated-array-set, https://github.com/senderista/hashtable-benchmarks). I would enjoy using Rust or Zig on a new project, but I consider the project itself to be much more important than the language it's written in. I am not interested in cryptocurrency, adtech, or fintech projects.
tigerbeetle
-
Redis Re-Implemented with SQLite
I'm waiting for someone to implement the Redis API by swapping out the state machine in TigerBeetle (which was built modularly such that the state machine can be swapped out).
https://tigerbeetle.com/
-
The Fastest and Safest Database [video]
I fully agree with what Prime says at the end - Joran has really set a new bar here for all future database presentations.
Hearing that the entire TigerBeetle domain logic lives in a single file [0] (and is intended to be pluggable for other OLTP use cases!) makes it 1000% more tempting to spend the weekend getting up to speed with Zig.
[0] https://github.com/tigerbeetle/tigerbeetle/blob/main/src/sta...
-
Building a Scalable Accounting Ledger
Why would you want to build your own accounting ledger from scratch? Accounting is a completely new domain for most engineers, and TigerBeetle (https://tigerbeetle.com/) already solves this problem.
- Tiger Style
- Tigerbeetle's Storage Fault Model
- Factor is faster than Zig
-
The Raft Consensus Algorithm
Maelstrom [1], a workbench for learning distributed systems from the creator of Jepsen, includes a simple (model-checked) implementation of Raft and an excellent tutorial on implementing it.
Raft is a simple algorithm, but as others have noted, the original paper includes many correctness details often brushed over in toy implementations. Furthermore, the fallibility of real-world hardware (handling memory/disk corruption and grey failures), the requirements of real-world systems with tight latency SLAs, and a need for things like flexible quorum/dynamic cluster membership make implementing it for production a long and daunting task. The commit history of etcd and hashicorp/raft, likely the two most battle-tested open source implementations of raft that still surface correctness bugs on the regular tell you all you need to know.
The tigerbeetle team talks in detail about the real-world aspects of distributed systems on imperfect hardware/non-abstracted system models, and why they chose viewstamp replication, which predates Paxos but looks more like Raft.
[1]: https://github.com/jepsen-io/maelstrom/
[2]: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...
- Fastest Branchless Binary Search
-
CWE Top Most Dangerous Software Weaknesses
> There is no reason to use a memory unsafe language anymore, except legacy codebases, and that is also slowly but surely diminishing. I'm still yet to hear this amazingly compelling reason that you just need memory unsafe languages. In terms of cost/benefits analysis, memory unsafety is literally all costs.
Tell that to the authors of new memory unsafe languages (like Zig) and creators of new project in those languages (like https://tigerbeetle.com) :(
- Problems of C, and how Zig addresses them
What are some alternatives?
unordered_dense - A fast & densely stored hashmap and hashset based on robin-hood backward shift deletion
LevelDB - LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
myria - Myria is a scalable Analytics-as-a-Service platform based on relational algebra.
zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
js2scheme
bun - Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
flat_hash_map - A very fast hashtable
reshade - A generic post-processing injector for games and video software.
robin-hood-hashing - Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
rafiki - An open-source, comprehensive Interledger service for wallet providers, enabling them to provide Interledger functionality to their users.
nafeez.xyz - ⚡ My personal website.
Box2D - Box2D is a 2D physics engine for games