Open-source projects categorized as Index | Edit details

Top 16 Index Open-Source Projects

  • GitHub repo sonic

    🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

    Project mention: Lightning-Fast, Open Source Search | news.ycombinator.com | 2021-05-14

    Typesense seems like a good fully-featured alternative to Elasticsearch. I.e. it's basically a database with fuzzy-search features (schemas, fields, facets, ordering, scoring profiles, etc), and its speed is enabled by holding everything in RAM.

    If you just want the fuzzy-search part (query string -> list of matching document ids) and don't want to pay for GBs of RAM, sonic [1] seems to be an interesting project. It's very fast (μs) and uses very little RAM but doesn't offer DB-like features such as sorting, schemas/fields, scoring etc. It's more of a low-level primitive for building your own search engine than an integrated search engine that's ready to use out of the box.

    [1]: https://github.com/valeriansaliou/sonic

  • GitHub repo riot

    Go Open Source, Distributed, Simple and efficient Search Engine (by go-ego)

  • GitHub repo h5ai

    HTTP web server index for Apache httpd, lighttpd and nginx.

    Project mention: Beat lightweight file server for Raspberry pi 3B+ | reddit.com/r/selfhosted | 2021-05-30

    Option 2 - https://larsjung.de/h5ai/

  • GitHub repo Apache Lucene

    Apache Lucene.NET

  • GitHub repo pysonar2

    PySonar2: an advanced semantic indexer for Python

    Project mention: Which Python static analysis tools should I use? | dev.to | 2021-03-02

    Some other tools are also worth mentioning, like PySonar2 (a type inferences and indexer), AutoPep8 (which automatically fixes PEP8). Also, don’t forget to check out the Code Quality mailing list, which currently covers PEP8, Pyflakes, mccabe, Flake8 and pylint.

  • GitHub repo blast

    Blast is a full text search and indexing server, written in Go, built on top of Bleve. (by mosuka)

  • GitHub repo hypopg

    Hypothetical Indexes for PostgreSQL

    Project mention: PostgreSQL Explain Output Explained | news.ycombinator.com | 2021-05-28
  • GitHub repo rum

    RUM access method - inverted index with additional information in posting lists

    Project mention: Debugging random slow writes in PostgreSQL | news.ycombinator.com | 2021-05-15

    We have been bitten by the same behavior. I gave a talk with a friend about this exact topic (diagnosing GIN pending list updates) at PGCon 2019 in Ottawa[1][2].

    What you need to know is that the pending list will be merged with the main b-tree during several operations. Only one of them is so extremely critical for your insert performance - that is during actual insert. Both vacuum and autovacuum (including autovacuum analyze but not direct analyze) will merge the pending list. So frequent autovacuums are the first thing you should tune. Merging on insert happens when you exceed the gin_pending_list_limit. In all cases it is also interesting to know which memory parameter is used to rebuild the index as that inpacts how long it will take: work_mem (when triggered on insert), autovacuum_work_mem (when triggered during autovauum) and maintainance_work_mem (triggered by a call to gin_clean_pending_list()) define how much memory can be used for the rebuild.

    What you can do is:

    - tune the size of the pending list (like you did)

    - make sure vacuum runs frequently

    - if you have a bulk insert heavy workload (ie. nightly imports), drop the index and create it after inserting rows (not always makes sense business wise, depends on your app)

    - disable fastupdate, you pay a higher cost per insert but remove the fluctuctuation when the merge needs to happen

    The first thing was done in the article. However I believe the author still relies on the list being merged on insert. If vacuums were tuned agressively along with the limit (vacuums can be tuned per table). Then the list would be merged out of bound of ongoing inserts.

    I also had the pleasure of speaking with one main authors of GIN indexes (Oleg Bartunov) during the mentioned PGCon. He gave probably the best solution and informed me to "just use RUM indexes". RUM[3] indexes are like GIN indexes, without the pending list and with faster ranking, faster phrase searches and faster timestamp based ordering. It is however out of the main postgresql release so it might be hard to get it running if you don't control the extensions that are loaded to your Postgres instance.

    [1] - wideo https://www.youtube.com/watch?v=Brt41xnMZqo&t=1s

    [2] - slides https://www.pgcon.org/2019/schedule/attachments/541_Let's%20...

    [3] - https://github.com/postgrespro/rum

  • GitHub repo alexandrie

    An alternative crate registry, implemented in Rust.

    Project mention: Run crates.io and package proxy locally | reddit.com/r/rust | 2021-03-20

    There's also https://github.com/Hirevo/Alexandrie

  • GitHub repo fancy-index

    A responsive Apache index page.

    Project mention: Looking for a self hosted file sharing interface | reddit.com/r/homelab | 2021-04-17

    A couple of examples https://oupala.github.io/apaxy/ http://ramlmn.github.io/Apache-Directory-Listing/ https://github.com/Vestride/fancy-index

  • GitHub repo elastix

    A simple Elasticsearch REST client written in Elixir.

  • GitHub repo idx

    maps, sets and vectors with on-demand secondary indexes.

    Project mention: Doxa - a simple db that copies the functionality of datascript, using meander as a query tool | reddit.com/r/Clojure | 2021-04-17

    You might be able to use my library: https://github.com/wotbrew/idx. Would add a tiny cost for the pulls but you should be able to beat data-script on many queries (by using hash indexes directly to entry vs btree indexes to eid).

  • GitHub repo StaticTypeInfo

    🏀 Up your type-game. A small C++ library for compile-time type names and type indices.

    Project mention: TheLartians/StaticTypeInfo - A small library for compile-time type names and type indices | reddit.com/r/cpp | 2021-04-21
  • GitHub repo rmi

    A learned index structure

    Project mention: A Recursive Model learned index structure | news.ycombinator.com | 2021-01-24
  • GitHub repo homeassistant-basestation

    Read and manage power states for your Valve Index® Base Stations in Home Assistant.

    Project mention: Basestation integration for Homeassistant: make your basestations part of your home automation. | reddit.com/r/ValveIndex | 2021-04-07
  • GitHub repo database

    Javascript object based database system. (by foxql)

    Project mention: Simple inverted index, database implemantation | news.ycombinator.com | 2021-04-11
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-05-30.


What are some of the best open-source Index projects? This list will help you:

Project Stars
1 sonic 11,414
2 riot 5,787
3 h5ai 4,436
4 Apache Lucene 1,617
5 pysonar2 1,126
6 blast 971
7 hypopg 619
8 rum 438
9 alexandrie 274
10 fancy-index 244
11 elastix 238
12 idx 72
13 StaticTypeInfo 56
14 rmi 51
15 homeassistant-basestation 9
16 database 8