Top 4 C++ bloom-filter Projects
-
munt-official
Munt is a witness-secured decentralized network for payments, digital assets, finance and more
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Show HN: B-field, a probabilistic key-value data structure (`rust-bfield`) | news.ycombinator.com | 2024-05-22
Very interesting and I'll have to read more to understand how it fully works, but _initially_ the space requirements doesn't seem too impressive? Am I missing something here? Maybe the solution here is more flexible?
One alternative approach for many of these problems is to start with a perfect minimal hash function which hashes your key into a unique number [0, N) and then have a packed array of size N where each element is of a fixed byte size. To look up the value you first use the hash function to get an index, and then you look up in the packed array. There's also no error rate here: This is exact.
PTHash (https://arxiv.org/abs/2104.10402) needs roughly ~4 bits per element to create a perfect minimal hash function.
> Store 1 billion web URLs and assign each of them one of a small number of categories values (n=8) in 2.22Gb (params include ν=8, κ=1, =0.1%; 19 bits per element)
Assuming that "n=8" here means "8 bits" we need 1GB (8 bits * billion) to represent all of the values, and then 500 MB for the hash function (4 bits * billion).
I also don't quite understand what "2.22Gb" here refers to. 19 bits * billion = 2.357 SI-giga bytes = 19 SI-giga bits = 2.212 gibi bytes.
> Store 1 billion DNA or RNA k-mers ("ACGTA...") and associate them with any of the ~500k bacterial IDs current described by NCBI in 6.93Gb (ν=62, κ=4, =0.1%; 59 bits per element)
"~500k bacterial ID" can be represented with 19 bits. 1 billion of these take ~2.3GB, and then we have the additional 500MB for the perfect hash function.
Another data structure which is even more fine-tuned for this problem space is Bumped Ribbon Retrieval (https://arxiv.org/abs/2109.01892) where they have <1% overhead over just storing the plain bit values.
-
cpstl
Copy and Paste standard library (CPSTL) is a repository with a collection of data structure and algorithms in many different languages
-
C++ bloom-filter discussion
C++ bloom-filter related posts
Index
What are some of the best open-source bloom-filter projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | munt-official | 135 |
2 | BuRR | 40 |
3 | cpstl | 13 |
4 | bloom_cpp | 3 |