-
Thanks for the details on your benchmarks. I would like sometime to extend BLP to a more generic setting; as I said I think any trick used with RH would also work with BLP. I just used an integer set because that's all I needed for my use case and it was easy to implement several different approaches for benchmarking. As you note, it favors use cases where the hash function is cheap (or invertible) and elements are cheap to move around.
About your question on load factors: no, the benchmarks are measuring exactly what they claim to be. The hash table constructor divides max data size by load factor to get the table size (https://github.com/senderista/hashtable-benchmarks/blob/mast...), and the benchmark code instantiates each hash table for exactly the measured data set size and load factor (https://github.com/senderista/hashtable-benchmarks/blob/mast...).
I can't explain the peaks around 1M in many of the plots; I didn't investigate them at the time and I don't have time now. It could be a JVM artifact, but I did try to use JMH "best practices", and there's no dynamic memory allocation or GC happening during the benchmark at all. It would be interesting to port these tables to Rust and repeat the measurements with Criterion. For more informative graphs I might try a log-linear approach: divide the intervals between the logarithmically spaced data sizes into a fixed number of subintervals (say 4).
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
zig
General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Actually it seems according to the issue that TigerBeetle (one of the bigger zig projects out there) noticed this issue [1]. It's also on their issue tracker [2].
[1] https://github.com/ziglang/zig/issues/17851
-
tigerbeetle
The financial transactions database designed for mission critical safety and performance.
-
My impression from the article is that Zig provides several different hashtables and not all of them are broken in this way.
This reminds me of Aria's comment in her Rust tutorial https://rust-unofficial.github.io/too-many-lists/ about failing to kill LinkedList. One philosophy (and the one Rust chose) for a stdlib is that this is only where things should live when they're so commonly needed that essentially everybody needs them either directly or to talk about. So, HashTable is needed by so much otherwise unrelated software that qualifies, BloomFilter, while it's real useful for some people, not so much. Aria cleaned out Rust's set of standard library containers before Rust 1.0, trying to keep only those most people would need. LinkedList isn't a good general purpose data structure, but, it was too popular and Aria was not able to remove it.
Having multiple hash tables feels like a win (they're optimized for different purposes) but may cost too much in terms of the necessary testing to ensure they all hit the quality you want.
-
-
robin-hood-hashing
Discontinued Fast & memory efficient hashtable based on robin hood hashing for C++11/14/17/20
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-