C++ information-retrieval

Open-source C++ projects categorized as information-retrieval

Top 4 C++ information-retrieval Projects

  • StringZilla

    Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

  • Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19

    The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when

    A. you do byte-level processing instead of float words;

    B. you use embedded, IoT, and other low-energy devices.

    A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.

    On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...

  • pisa

    PISA: Performant Indexes and Search for Academia

  • Project mention: A Compressed Indexable Bitset | news.ycombinator.com | 2023-07-01

    The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.

    It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.

    It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.

    [1] https://github.com/ot/ds2i/

    [2] https://github.com/pisa-engine/pisa

    [3] https://github.com/facebook/folly/blob/main/folly/experiment...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ds2i

    A library of inverted index data structures

  • Project mention: A Compressed Indexable Bitset | news.ycombinator.com | 2023-07-01

    The EF core algorithm implemented in folly [3] may be a bit faster, and implementing partitioning on top of that is relatively easy.

    It would definitely compress much better than roaring bitmaps. In terms of performance, it depends on the access patterns. If very sparse (large jumps) PEF would likely be faster, if dense (visit a large fraction of the bitmap) it'd be slower.

    It is possible to squeeze a bit more compression out of PEF by introducing a chunk type for Elias-Fano of the chunk complement (for very dense chunks), but you lose the operation of skipping to a given position, which is however not needed in inverted indexes (you only need to skip past a given id, and that can be supported efficiently). That is not mentioned in the paper because at the time I thought the skip-to-position operation was a non-negotiable.

    [1] https://github.com/ot/ds2i/

    [2] https://github.com/pisa-engine/pisa

    [3] https://github.com/facebook/folly/blob/main/folly/experiment...

  • alias

    Productivity app for accessing content (commands, text, etc) (super) fast content using tags. (by agudpp)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ information-retrieval related posts

Index

What are some of the best open-source information-retrieval projects in C++? This list will help you:

Project Stars
1 StringZilla 1,791
2 pisa 855
3 ds2i 141
4 alias 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com