pisa
efg
pisa | efg | |
---|---|---|
1 | 2 | |
859 | 14 | |
1.5% | - | |
8.0 | 4.7 | |
15 days ago | 10 months ago | |
C++ | C | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
pisa
-
A Compressed Indexable Bitset
The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.
It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.
It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.
[1] https://github.com/ot/ds2i/
[2] https://github.com/pisa-engine/pisa
[3] https://github.com/facebook/folly/blob/main/folly/experiment...
efg
-
Vectorizing Graph Neural Networks
> I believe you are not guaranteed for the edge data of adjacent nodes to be adjacent in memory
The edge data of a particular node is contiguous, but yes, the edge data of a collection of nodes is not contiguous. You can reorder (permute) the graph for some metric as a preprocessing step so that you get better locality. This only works for static graphs though.
> For float-based edge data I think quantization works well, and I believe you can further compress the ROW/COL indices
Yes, index compression is pretty well studied and understood. The challenge here is mostly good compression ratio and high decompression performance. There are a couple of works that I'm aware of that do this for gpus. This repo by Mo Sha et al. (https://github.com/desert0616/GCGT) is pretty good, and I also did some work in this space (https://github.com/pgera/efg).
-
A Compressed Indexable Bitset
Btw, core EF is quite efficient on the decoding side even on GPUs. I wanted to do PEF, but that seemed a bit more involved and didn't have the time to do it. Here's a GPU implementation for graph problems if anyone is interested: https://github.com/pgera/efg
What are some alternatives?
lucene - Apache Lucene open-source search software
GCGT - Source code for the paper: GPU-based Compressed Graph Traversal
Apache Solr - Apache Lucene and Solr open-source search software
ds2i - A library of inverted index data structures
resin - Vector space search engine. Available as a HTTP service or as an embedded library.
nodevectors - Fastest network node embeddings in the west
MeTA - A Modern C++ Data Sciences Toolkit
Folly - An open-source C++ library developed and used at Facebook.
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
duckdb-pgq - DuckDB is an in-process SQL OLAP Database Management System
alexandria - Full text search engine powering Alexandria.org - the open search engine.
tantivy - Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust