A Compressed Indexable Bitset

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Folly

90 27,072 9.8 C++

An open-source C++ library developed and used at Facebook.

The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.
It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.
It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.
[1] https://github.com/ot/ds2i/
[2] https://github.com/pisa-engine/pisa
[3] https://github.com/facebook/folly/blob/main/folly/experiment...

ds2i

1 141 0.0 C++

A library of inverted index data structures

The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.
It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.
It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.
[1] https://github.com/ot/ds2i/
[2] https://github.com/pisa-engine/pisa
[3] https://github.com/facebook/folly/blob/main/folly/experiment...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pisa

1 855 8.2 C++

PISA: Performant Indexes and Search for Academia

The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.
It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.
It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.
[1] https://github.com/ot/ds2i/
[2] https://github.com/pisa-engine/pisa
[3] https://github.com/facebook/folly/blob/main/folly/experiment...

efg

2 14 4.7 C

GPU based Compressed Graph Traversal

Btw, core EF is quite efficient on the decoding side even on GPUs. I wanted to do PEF, but that seemed a bit more involved and didn't have the time to do it. Here's a GPU implementation for graph problems if anyone is interested: https://github.com/pgera/efg

tantivy

48 9,896 9.1 Rust

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

The roaring bitmap variant is used only for the optional index (1 docid => 0 or 1 value) in the columnar storage (DocValues), not for the inverted index. Since this is used for aggregation, some queries may be a full scan.
The inverted index in tantivy uses bitpacked values of 128 elements with a skip index on top.
> I didn't follow the rest of your comment, select is what EF is good at, every other data structure needs a lot more scanning once you land on the right chunk. With BMI2 you can also use the PDEP instruction to accelerate the final select on a 64-bit block
The select for the sparse codec is a [simple array index access](https://github.com/quickwit-oss/tantivy/blob/main/columnar/s...), that is hard to beat. Compression is not good near the 5k threshold though.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

anyone knows a redis image with redis search and rscoordinator modules loaded?
1 project | /r/redis | 19 Oct 2022
redis cluster is redisearch
1 project | /r/redis | 14 Oct 2022
SeekStorm VS tantivy - a user suggested alternative
2 projects | 22 Mar 2024
Open-source Rust-based RAG
3 projects | news.ycombinator.com | 10 Mar 2024
YaCy, a distributed Web Search Engine, based on a peer-to-peer network
9 projects | news.ycombinator.com | 5 Mar 2024

A Compressed Indexable Bitset

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
inverted-index search-engine Search information-retrieval Frameworks
Post date: 1 Jul 2023

Folly

ds2i

InfluxDB

pisa

efg

tantivy

WorkOS

Related posts

A Compressed Indexable Bitset

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com inverted-index search-engine Search information-retrieval Frameworks Post date: 1 Jul 2023

Folly

ds2i

InfluxDB

pisa

efg

tantivy

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
inverted-index search-engine Search information-retrieval Frameworks
Post date: 1 Jul 2023