C++ bloom-filter

Open-source C++ projects categorized as bloom-filter

Top 4 C++ bloom-filter Projects

bloom-filter
  1. munt-official

    Munt is a witness-secured decentralized network for payments, digital assets, finance and more

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. BuRR

    Bumped Ribbon Retrieval and Approximate Membership Query (by lorenzhs)

    Project mention: Show HN: B-field, a probabilistic key-value data structure (`rust-bfield`) | news.ycombinator.com | 2024-05-22

    Very interesting and I'll have to read more to understand how it fully works, but _initially_ the space requirements doesn't seem too impressive? Am I missing something here? Maybe the solution here is more flexible?

    One alternative approach for many of these problems is to start with a perfect minimal hash function which hashes your key into a unique number [0, N) and then have a packed array of size N where each element is of a fixed byte size. To look up the value you first use the hash function to get an index, and then you look up in the packed array. There's also no error rate here: This is exact.

    PTHash (https://arxiv.org/abs/2104.10402) needs roughly ~4 bits per element to create a perfect minimal hash function.

    > Store 1 billion web URLs and assign each of them one of a small number of categories values (n=8) in 2.22Gb (params include ν=8, κ=1, =0.1%; 19 bits per element)

    Assuming that "n=8" here means "8 bits" we need 1GB (8 bits * billion) to represent all of the values, and then 500 MB for the hash function (4 bits * billion).

    I also don't quite understand what "2.22Gb" here refers to. 19 bits * billion = 2.357 SI-giga bytes = 19 SI-giga bits = 2.212 gibi bytes.

    > Store 1 billion DNA or RNA k-mers ("ACGTA...") and associate them with any of the ~500k bacterial IDs current described by NCBI in 6.93Gb (ν=62, κ=4, =0.1%; 59 bits per element)

    "~500k bacterial ID" can be represented with 19 bits. 1 billion of these take ~2.3GB, and then we have the additional 500MB for the perfect hash function.

    Another data structure which is even more fine-tuned for this problem space is Bumped Ribbon Retrieval (https://arxiv.org/abs/2109.01892) where they have <1% overhead over just storing the plain bit values.

  4. cpstl

    Copy and Paste standard library (CPSTL) is a repository with a collection of data structure and algorithms in many different languages

  5. bloom_cpp

    Bloom Filter implemention in C++

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ bloom-filter discussion

Log in or Post with

C++ bloom-filter related posts

  • a better way to read from a socket with unusual end-of-stream tokens

    1 project | /r/golang | 4 Dec 2022
  • Gulden becomes Munt

    1 project | /r/munt | 4 Oct 2022
  • Why are white papers so unprofessional?

    1 project | /r/CryptoCurrency | 9 May 2021

Index

What are some of the best open-source bloom-filter projects in C++? This list will help you:

# Project Stars
1 munt-official 135
2 BuRR 40
3 cpstl 13
4 bloom_cpp 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?