hegg
RoaringBitmap
hegg | RoaringBitmap | |
---|---|---|
3 | 24 | |
72 | 3,404 | |
- | 1.3% | |
6.7 | 8.5 | |
19 days ago | about 1 month ago | |
Haskell | Java | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hegg
- [ANN] E-graphs and equality saturation: hegg 0.1
-
Ask HN: What are some 'cool' but obscure data structures you know about?
Equality graphs (e-graphs) for theorem proving and equality saturation and other equality-related things.
They're awesome data structures that efficiently maintain a congruence relation over many expressions
> At a high level, e-graphs extend union-find to compactly represent equivalence classes of expressions while maintaining a key invariant: the equivalence relation is closed under congruence.
e.g. If I were to represent "f(x)" and "f(y)" in the e-graph, and then said "x == y" (merged "x" and "y" in the e-graph), then the e-graph, by congruence, would be able to tell me that "f(x) == f(y)"
e.g. If I were to represent "a(2/2)", in the e-graph, then say "2/2 == 1", and "x1 == x", by congruence the e-graph would know "a*(2/2) == a" !
The most recent description of e-graphs with an added insight on implementation is https://arxiv.org/pdf/2004.03082.pdf to the best of my knowledge.
P.S: I'm currently implementing them in Haskell https://github.com/alt-romes/hegg
RoaringBitmap
-
Iterating over Bit Sets Quickly
I was recently reading about Roaring https://roaringbitmap.org/ which is a highly optimized compressed bitset implementation. I reccomend reading about it if you are interested in this sort of thing. The talk at https://roaringbitmap.org/talks/ is especially good.
- Roaring Bitmaps
- Roaring bitmaps are compressed bitmaps, can be 100x faster
-
What feature would you like to remove in C++26?
However, I would love compressed (not just packed) bitsets too, which is something different to me. I would make it another class with a similar interface, based on something like roaring. It doesn't need to be in the standard, but it would be nice if the API was a such that one could easily swap implementations.
-
Jaccard Index
As an aside if you find yourself having to compute them on the fly, know that the Roaring Bitmaps libraries is the way to go [1]. The bitmaps are compressed, and can be streamed directly into SIMD computations (batching XORs and popcnts 256 bits wide!). The Jaccard index is just intersection_len / union_len [2] away
[1] https://roaringbitmap.org/
[2] https://roaringbitmap.readthedocs.io/en/latest/#roaringbitma...
-
Looking for fast, space-efficient key-lookup
Use a two stage approach, with a bloom/cuckoo filter stored as a https://roaringbitmap.org/ in memory. Then a secondary key/value store on disk (bolt or anything else).
-
BitSet Vs BigInteger
As an aside, if you're dealing with large bit sets, you might also want to evaluate Roaring Bitmaps.
-
Negative Incentives in Academic Research
Sidetracking a bit the conversation. What a coincidence that the author (Lemire) is also represented on Today's #1 "Ask HN: What are some cool but obscure data structures you know about?" as he is the main contributor of RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap and one of the main authors of the data structure.
- Ask HN: What are some 'cool' but obscure data structures you know about?
- Roaring bitmaps: A better compressed bitset
What are some alternatives?
Folly - An open-source C++ library developed and used at Facebook.
HyperMinHash-java - Union, intersection, and set cardinality in loglog space
us - An alternative interface to Sia
lucene - Apache Lucene open-source search software
CPython - The Python programming language
CQEngine - Ultra-fast SQL-like queries on Java collections
ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python
Primes - Prime Number Projects in C#/C++/Python
multiversion-concurrency-contro
Feign - Feign makes writing java http clients easier
TablaM - The practical relational programing language for data-oriented applications
maven-compiler-plugin - Apache Maven Compiler Plugin