Top 23 nearest-neighbor-search Open-Source Projects

Milvus

104 26,490 10.0 Go

A cloud-native vector database, storage for next generation AI applications

Project mention: Ask HN: Who is hiring? (April 2024) | news.ycombinator.com | 2024-04-01

Zilliz (zilliz.com) | Hybrid/ONSITE (SF, NYC) | Full-time
I am part of the hiring team for DevRel
NYC - https://boards.greenhouse.io/zilliz/jobs/4307910005
SF - https://boards.greenhouse.io/zilliz/jobs/4317590005
Zilliz is the company behind Milvus (https://github.com/milvus-io/milvus), the most starred vector database on GitHub. Milvus is a distributed vector database that shines in 1B+ vector use cases. Examples include autonomous driving, e-commerce, and drug discovery. (and, of course, RAG)
We are also hiring for other roles that I am not personally involved in the hiring process for such as product managers, software engineers, and recruiters.
qdrant

139 17,718 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03

I'm currently looking to implement locally, using QDrant [1] for instance.
I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].
[1]. https://qdrant.tech/
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
annoy

40 12,662 5.3 C++

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Project mention: Do we think about vector dbs wrong? | news.ycombinator.com | 2023-09-05

The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.
Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.
Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.
In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.
On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.
Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database
Weaviate

76 9,359 10.0 Go

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
pgvector

77 8,904 9.7 C

Open-source vector similarity search for Postgres

Project mention: Vector Database solutions on AWS | dev.to | 2024-03-28

When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.
Smile

9 5,914 9.0 Java

Statistical Machine Intelligence & Learning Engine

Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

> I don't think it's right to recommend that new users move away from the package because of licensing issues
I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:
https://github.com/haifengl/smile/commit/6f22097b233a3436519...
And literally no mention in the release notes:
https://github.com/haifengl/smile/releases/tag/v3.0.0
I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.
So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.
mlpack

4 4,787 9.9 C++

mlpack: a fast, header-only C++ machine learning library

Project mention: How much C++ is used when it comes to performing quant research? | /r/quant | 2023-07-03

Does C++ have the equivalent of Pandas or Apache Spark? Are there extensive libraries that exist/are being developed that allow you to perform operations with data? Or do people just use a combination of Python & its various libraries (NumPy etc)? If we leave aside the data bit, are there libraries that allow you to develop ML models in C++ (mlpack for instance ) faster & more efficiently compared to their Python counterparts (scikit-learn)? On a more general note, how does C++ fit into the routine of a Quant Researcher? And at what scale does an organization decide they need to start switching to other languages and spend more time developing the code ?
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
docarray

32 2,730 9.2 Python

Represent, send, store and search multimodal data

Project mention: DocArray – Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27
usearch

20 1,611 9.8 C++

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
vald

13 1,451 9.4 Go

Vald. A Highly Scalable Distributed Vector Search Engine

Project mention: What is the reason for using go mod replace like this? | /r/golang | 2023-04-24
pgvecto.rs

17 1,364 9.4 Rust

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.

Project mention: My binary vector search is better than your FP32 vectors | dev.to | 2024-03-25

To evaluate the performance metrics in comparison to the original vector approach, we conducted benchmarking using the dbpedia-entities-openai3-text-embedding-3-large-3072-1M dataset. The benchmark was performed on a Google Cloud virtual machine (VM) with specifications of n2-standard-8, which includes 8 virtual CPUs and 32GB of memory. We used pgvecto.rs v0.2.1 as the vector database.
awesome-vector-search

20 1,257 6.1

Collections of vector search related libraries, service and research papers

Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07
voyager

4 1,142 8.1 C++

🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability. (by spotify)

Project mention: FLaNK Stack for 04 December 2023 | dev.to | 2023-12-04
similarity

7 994 6.5 Python

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
soundfingerprinting

6 902 8.1 C#

Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.

Project mention: Ask HN: How many of you are self employed? | news.ycombinator.com | 2024-02-05

Started 10 years ago as an open-source project, building an algorithm for audio fingerprinting. Added a commercial offering, selling storage built specifically for audio fingerprints, targeting enterprise customers. Since the offering was too technical (it's hard to sell solutions to problems that are too narrow and domain-specific), pivoted to more "business-oriented problems". This last year's pivot is a chance to finally grow. Running a business in single-player mode is, at times, too stressful. Aside from the technical part, which I very much enjoy, I need to wear marketing, sales, and customer support hats.
[1] - https://emysound.com
pynndescent

4 837 6.5 Python

A Python nearest neighbor descent for approximate nearest neighbors

Project mention: [D]: Best nearest neighbour search for high dimensions | /r/MachineLearning | 2023-05-17

I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.
voy

4 696 7.4 Rust

🕸️🦀 A WASM vector similarity search written in Rust
Project mention: Ask HN: Semantic Vector Searching in WASM? | news.ycombinator.com | 2024-01-03

Would this[1] library help you? It's a Rust vector similarity search engine, written to be compiled to Wasm. I discovered it through articles like these[2].
```
    [1] https://github.com/tantaraio/voy
```
quaterion

4 619 2.3 Python

Blazing fast framework for fine-tuning similarity learning models
neighbor

1 417 7.5 Ruby

Nearest neighbor search for Rails and Postgres
elastiknn

1 352 8.7 Scala

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
pgANN

2 289 0.0 Python

Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

Project mention: Pinecone raises $100M Series B | news.ycombinator.com | 2023-04-27

Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.
https://github.com/netrasys/pgANN
TorchPQ

3 201 3.5 Cuda

Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
awesome-vector-database

1 127 9.0

A curated list of awesome works related to high dimensional structure/vector search & database

Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-07.

nearest-neighbor-search related posts

Vector Database solutions on AWS
1 project | dev.to | 28 Mar 2024
My binary vector search is better than your FP32 vectors
1 project | dev.to | 25 Mar 2024
Unlock Advanced Search Capabilities with Milvus and Read about RAG
1 project | dev.to | 22 Mar 2024
Using pgvector To Locate Similarities In Enterprise Data
2 projects | dev.to | 21 Mar 2024
pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL
1 project | dev.to | 19 Mar 2024
pgvecto.rs alternatives - qdrant and Weaviate
3 projects | 13 Mar 2024
Milvus VS pgvecto.rs - a user suggested alternative
2 projects | 13 Mar 2024
A note from our sponsor - WorkOS
workos.com | 18 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source nearest-neighbor-search projects? This list will help you:

	Project	Stars
1	Milvus	26,490
2	qdrant	17,718
3	annoy	12,662
4	Weaviate	9,359
5	pgvector	8,904
6	Smile	5,914
7	mlpack	4,787
8	docarray	2,730
9	usearch	1,611
10	vald	1,451
11	pgvecto.rs	1,364
12	awesome-vector-search	1,257
13	voyager	1,142
14	similarity	994
15	soundfingerprinting	902
16	pynndescent	837
17	voy	696
18	quaterion	619
19	neighbor	417
20	elastiknn	352
21	pgANN	289
22	TorchPQ	201
23	awesome-vector-database	127