Top 20 approximate-nearest-neighbor-search Open-Source Projects

qdrant

139 17,718 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03

I'm currently looking to implement locally, using QDrant [1] for instance.
I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].
[1]. https://qdrant.tech/

annoy

40 12,662 5.3 C++

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Project mention: Do we think about vector dbs wrong? | news.ycombinator.com | 2023-09-05

The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.
Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.
Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.
In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.
On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.
Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Weaviate

76 9,436 10.0 Go

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13

pgvector

77 9,067 9.7 C

Open-source vector similarity search for Postgres

Project mention: Vector Database solutions on AWS | dev.to | 2024-03-28

When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.

SPTAG

4 4,693 5.1 C++

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
hora

9 2,552 0.0 Rust

🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

Project mention: Building a Vector Database with Rust to Make Use of Vector Embeddings | /r/rust | 2023-06-01

We have been playing around with Hora as a replacement for the Rust-CV implementation as we want PQ as well. I'll check out instanct-distance, looks very interesting!

usearch

20 1,611 9.8 C++

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
vald

13 1,453 9.4 Go

Vald. A Highly Scalable Distributed Vector Search Engine

Project mention: What is the reason for using go mod replace like this? | /r/golang | 2023-04-24

pynndescent

4 837 6.5 Python

A Python nearest neighbor descent for approximate nearest neighbors

Project mention: [D]: Best nearest neighbour search for high dimensions | /r/MachineLearning | 2023-05-17

I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.

pecos

1 489 7.4 Python

PECOS - Prediction for Enormous and Correlated Spaces
big-ann-benchmarks

1 291 9.5 Jupyter Notebook

Framework for evaluating ANNS algorithms on billion scale datasets.

Project mention: Practical Vector Search: NeurIPS 2023 Competition Leaderboard | news.ycombinator.com | 2024-03-01

pgANN

2 289 0.0 Python

Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

Project mention: Pinecone raises $100M Series B | news.ycombinator.com | 2023-04-27

Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.
https://github.com/netrasys/pgANN

instant-distance

7 281 5.6 Rust

Fast approximate nearest neighbor searching in Rust, based on HNSW index

Project mention: Show HN: A fast HNSW implementation in Rust | news.ycombinator.com | 2024-03-14

arroy

2 171 9.5 Rust

Annoy-inspired Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage :boom:

Project mention: Unveiling arroy: Meilisearch's Latest ANNs Innovation with Rust and LMDB – A Nod to Spotify's Anno | dev.to | 2023-12-01

For more information and advanced usage, refer to the official Arroy documentation.

awesome-vector-database

1 127 9.0

A curated list of awesome works related to high dimensional structure/vector search & database

Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07

citrus

1 92 7.6 Python

(distributed) vector database (by 0xDebabrata)

Project mention: Created a smol vector database in my free time. Looking to provide a LangChain integration soon! | /r/LangChain | 2023-05-06

It supports all the basic features like creating an index, inserting vectors and searching through them. Here's the GitHub link if anyone's interested in going over it: https://github.com/0xDebabrata/citrus

horapy

2 69 0.0 Python

🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library
hora-wasm

2 51 0.0 Rust

webassembly binding for Hora Approximate Nearest Neighbor Search Library
alvd

1 50 0.0 Go

alvd = A Lightweight Vald. A lightweight distributed vector search engine works without K8s.
TileDB-Vector-Search

3 44 9.5 Jupyter Notebook

Cloud-native vector similarity search and storage with efficient, serverless scale-out

Project mention: Ask HN: Who is hiring? (September 2023) | news.ycombinator.com | 2023-09-01

- vector search, utilizing TileDB and TileDB Cloud for seamless scaling: https://tiledb.com/blog/why-tiledb-as-a-vector-database (library: https://github.com/TileDB-Inc/TileDB-Vector-Search)

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-03.

approximate-nearest-neighbor-search related posts

Vector Database solutions on AWS
1 project | dev.to | 28 Mar 2024
Using pgvector To Locate Similarities In Enterprise Data
2 projects | dev.to | 21 Mar 2024
pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL
1 project | dev.to | 19 Mar 2024
Show HN: A fast HNSW implementation in Rust
6 projects | news.ycombinator.com | 14 Mar 2024
Pg_vectorize: The simplest way to do vector search and RAG on Postgres
6 projects | news.ycombinator.com | 6 Mar 2024
USearch SQLite Extensions for Vector and Text Search
1 project | news.ycombinator.com | 22 Feb 2024
Simplifying the Milvus Selection Process
3 projects | dev.to | 19 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 20 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source approximate-nearest-neighbor-search projects? This list will help you:

	Project	Stars
1	qdrant	17,718
2	annoy	12,662
3	Weaviate	9,436
4	pgvector	9,067
5	SPTAG	4,693
6	hora	2,552
7	usearch	1,611
8	vald	1,453
9	pynndescent	837
10	pecos	489
11	big-ann-benchmarks	291
12	pgANN	289
13	instant-distance	281
14	arroy	171
15	awesome-vector-database	127
16	citrus	92
17	horapy	69
18	hora-wasm	51
19	alvd	50
20	TileDB-Vector-Search	44