Top 16 hash-table Open-Source Projects

garnet

4 9,063 9.3 C#

Garnet is a remote cache-store from Microsoft Research that offers strong performance (throughput and latency), scalability, storage, recovery, cluster sharding, key migration, and replication features. Garnet can work with existing Redis clients.

Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

You would be surprised by performance of modern .NET :)
Writing no-alloc is oftentimes done by reducing complexity and not doing "stupid" tricks that actually work against JIT and CoreLib features.
For databases specifically, .NET is actually positioned very well with its low-level features (intrisics incl. SIMD, FFI, struct generics though not entirely low-level) and high-throughput GC.
Interesting example of this applied in practice is Garnet[0]/FASTER[1]. Keep in mind that its codebase still consist of un-idiomatic C# and you can do way better by further simplification, but it already does the job well enough.
[0] https://github.com/microsoft/garnet
[1] https://github.com/microsoft/FASTER

FASTER

8 6,199 7.2 C#

Fast persistent recoverable log and key-value store + cache, in C# and C++.

Project mention: A MySQL compatible database engine written in pure Go | news.ycombinator.com | 2024-04-09

You would be surprised by performance of modern .NET :)
Writing no-alloc is oftentimes done by reducing complexity and not doing "stupid" tricks that actually work against JIT and CoreLib features.
For databases specifically, .NET is actually positioned very well with its low-level features (intrisics incl. SIMD, FFI, struct generics though not entirely low-level) and high-throughput GC.
Interesting example of this applied in practice is Garnet[0]/FASTER[1]. Keep in mind that its codebase still consist of un-idiomatic C# and you can do way better by further simplification, but it already does the job well enough.
[0] https://github.com/microsoft/garnet
[1] https://github.com/microsoft/FASTER

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
data-structures

1 2,797 0.0 Java

A collection of powerful data structures (by williamfiset)
pogreb

4 1,224 5.3 Go

Embedded key-value store for read-heavy workloads written in Go

Project mention: Sparkey is a simple constant key/value storage library | news.ycombinator.com | 2024-01-04

robin-map

10 1,171 5.7 C++

C++ implementation of a fast hash map and hash set using robin hood hashing

Project mention: Factor is faster than Zig | news.ycombinator.com | 2023-11-10

In my example the table stores the hash codes themselves instead of the keys (because the hash function is invertible)
Oh, I see, right. If determining the home bucket is trivial, then the back-shifting method is great. The issue is just that it’s not as much of a general-purpose solution as it may initially seem.
“With a different algorithm (Robin Hood or bidirectional linear probing), the load factor can be kept well over 90% with good performance, as the benchmarks in the same repo demonstrate.”
I’ve seen the 90% claim made several times in literature on Robin Hood hash tables. In my experience, the claim is a bit exaggerated, although I suppose it depends on what our idea of “good performance” is. See these benchmarks, which again go up to a maximum load factor of 0.95 (Although boost and Absl forcibly grow/rehash at 0.85-0.9):
https://strong-starlight-4ea0ed.netlify.app/
Tsl, Martinus, and CC are all Robin Hood tables (https://github.com/Tessil/robin-map, https://github.com/martinus/robin-hood-hashing, and https://github.com/JacksonAllan/CC, respectively). Absl and Boost are the well-known SIMD-based hash tables. Khash (https://github.com/attractivechaos/klib/blob/master/khash.h) is, I think, an ordinary open-addressing table using quadratic probing. Fastmap is a new, yet-to-be-published design that is fundamentally similar to bytell (https://www.youtube.com/watch?v=M2fKMP47slQ) but also incorporates some aspects of the aforementioned SIMD maps (it caches a 4-bit fragment of the hash code to avoid most key comparisons).
As you can see, all the Robin Hood maps spike upwards dramatically as the load factor gets high, becoming as much as 5-6 times slower at 0.95 vs 0.5 in one of the benchmarks (uint64_t key, 256-bit struct value: Total time to erase 1000 existing elements with N elements in map). Only the SIMD maps (with Boost being the better performer) and Fastmap appear mostly immune to load factor in all benchmarks, although the SIMD maps do - I believe - use tombstones for deletion.
I’ve only read briefly about bi-directional linear probing – never experimented with it.

Hopscotch map

3 698 3.7 C++

C++ implementation of a fast hash map and hash set using hopscotch hashing
ordered-map

2 500 3.5 C++

C++ hash map and hash set which preserve the order of insertion
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
komihash

6 178 6.7 C

Very fast, high-quality hash function, discrete-incremental and streamed hashing-capable (non-cryptographic, inline C/C++) 26GB/s + PRNG
preshed

1 78 4.1 Cython

💥 Cython hash tables that assume keys are pre-hashed

Project mention: Is anyone using PyPy for real work? | news.ycombinator.com | 2023-07-31

If you have very large dicts, you might find this hash table I wrote for spaCy helpful: https://github.com/explosion/preshed . You need to key the data with 64-bit keys. We use this wrapper around murmurhash for it: https://github.com/explosion/murmurhash
There's no docs so obviously this might not be for you. But the software does work, and is efficient. It's been executed many many millions of times now.

leetcode-swift

1 56 9.4 Swift

TOP 200 #Dev 🏆 LeetCode, Solutions in  Swift, Shell, Database (T-SQL, PL/SQL, MySQL), Concurrency (Python3). @ S. Leschev. Google Engineering Level: L6+ (by sergeyleschev)

Project mention: LeetCode Hard, last two problems: 2809. Minimum Time to Make Array Sum At Most x & 2813. Max Elegance of a K-Length Subseq. | dev.to | 2023-08-12

fph-table

3 37 4.0 C++

Flash Perfect Hash Table: an implementation of a dynamic perfect hash table, extremely fast for lookup
Abstract-Data-Types

4 34 5.3 C

A set of efficient data structures in C, created in a generic way
hashtable-bench

1 12 1.7 Jupyter Notebook

A benchmark for hash tables and hash functions in C++, evaluate on different data as comprehensively as possible
HashTableBenchmark

10 10 0.0 C++

A simple cross-platform speed & memory-efficiency benchmark for the most common hash-table implementations in the C++ world
qc-hash

1 10 5.7 C++

Extremely fast unordered map and set library for C++20
Generic-C-DataStructures

3 1 4.3 C

A repository for code I wrote while learning to implement generic data structures in C
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

hash-table related posts

Fast persistent recoverable log and key-value store
3 projects | news.ycombinator.com | 24 Feb 2024
Sparkey is a simple constant key/value storage library
5 projects | news.ycombinator.com | 4 Jan 2024
Sources to learn Data structure implementation in C
2 projects | /r/C_Programming | 11 Jan 2023
Data structures library
1 project | /r/learnprogramming | 16 Dec 2022
Abstract Data Structures Library
1 project | /r/C_Programming | 16 Dec 2022
GitHub - microsoft/FASTER: Fast persistent recoverable log and key-value store + cache, in C# and C++.
1 project | /r/GodotCSharp | 8 Nov 2022
How to create std::map that preserves the order of insertion just using standard C++?
1 project | /r/cpp_questions | 17 Sep 2022
A note from our sponsor - WorkOS
workos.com | 29 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source hash-table projects? This list will help you:

	Project	Stars
1	garnet	9,063
2	FASTER	6,199
3	data-structures	2,797
4	pogreb	1,224
5	robin-map	1,171
6	Hopscotch map	698
7	ordered-map	500
8	komihash	178
9	preshed	78
10	leetcode-swift	56
11	fph-table	37
12	Abstract-Data-Types	34
13	hashtable-bench	12
14	HashTableBenchmark	10
15	qc-hash	10
16	Generic-C-DataStructures	1