Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 20 approximate-nearest-neighbor-search Open-Source Projects
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
-
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
-
SPTAG
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
-
hora
🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .
-
usearch
Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
arroy
Annoy-inspired Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage :boom:
-
awesome-vector-database
A curated list of awesome works related to high dimensional structure/vector search & database
-
TileDB-Vector-Search
Cloud-native vector similarity search and storage with efficient, serverless scale-out
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03I'm currently looking to implement locally, using QDrant [1] for instance.
I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].
[1]. https://qdrant.tech/
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.
Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.
Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.
In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.
On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.
Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database
Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.
Project mention: Building a Vector Database with Rust to Make Use of Vector Embeddings | /r/rust | 2023-06-01We have been playing around with Hora as a replacement for the Rust-CV implementation as we want PQ as well. I'll check out instanct-distance, looks very interesting!
Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
Project mention: [D]: Best nearest neighbour search for high dimensions | /r/MachineLearning | 2023-05-17I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.
Project mention: Practical Vector Search: NeurIPS 2023 Competition Leaderboard | news.ycombinator.com | 2024-03-01
Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.
Project mention: Unveiling arroy: Meilisearch's Latest ANNs Innovation with Rust and LMDB – A Nod to Spotify's Anno | dev.to | 2023-12-01For more information and advanced usage, refer to the official Arroy documentation.
Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07
Project mention: Created a smol vector database in my free time. Looking to provide a LangChain integration soon! | /r/LangChain | 2023-05-06It supports all the basic features like creating an index, inserting vectors and searching through them. Here's the GitHub link if anyone's interested in going over it: https://github.com/0xDebabrata/citrus
- vector search, utilizing TileDB and TileDB Cloud for seamless scaling: https://tiledb.com/blog/why-tiledb-as-a-vector-database (library: https://github.com/TileDB-Inc/TileDB-Vector-Search)
approximate-nearest-neighbor-search related posts
- Vector Database solutions on AWS
- Using pgvector To Locate Similarities In Enterprise Data
- pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL
- Show HN: A fast HNSW implementation in Rust
- Pg_vectorize: The simplest way to do vector search and RAG on Postgres
- USearch SQLite Extensions for Vector and Text Search
- Simplifying the Milvus Selection Process
-
A note from our sponsor - InfluxDB
www.influxdata.com | 20 Apr 2024
Index
What are some of the best open-source approximate-nearest-neighbor-search projects? This list will help you:
Project | Stars | |
---|---|---|
1 | qdrant | 17,718 |
2 | annoy | 12,662 |
3 | Weaviate | 9,436 |
4 | pgvector | 9,067 |
5 | SPTAG | 4,693 |
6 | hora | 2,552 |
7 | usearch | 1,611 |
8 | vald | 1,453 |
9 | pynndescent | 837 |
10 | pecos | 489 |
11 | big-ann-benchmarks | 291 |
12 | pgANN | 289 |
13 | instant-distance | 281 |
14 | arroy | 171 |
15 | awesome-vector-database | 127 |
16 | citrus | 92 |
17 | horapy | 69 |
18 | hora-wasm | 51 |
19 | alvd | 50 |
20 | TileDB-Vector-Search | 44 |
Sponsored