Introduction to Vector Similarity Search

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

faiss

70 28,202 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

https://github.com/facebookresearch/faiss

pgvector

78 9,211 9.9 C

Open-source vector similarity search for Postgres

https://github.com/pgvector/pgvector
`ankane/pgvector` docker image is a drop in replacement for the postgres image, so you can fire this up with docker very quickly.
It's a normal postgres db with a vector datatype. It can index the vectors and allows efficient retrieval. Both AWS RDS and Google Cloud now support this in their managed Postgres offerings, so postgres+pgvector is a viable managed production vectordb solution.
> Also, how granular should the text chunks be?
That depends on the use case, the size of your corpus, the context of the model you are using, how much money you are willing to spend.
> Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.
Definitely. We use postgres+pgvector with php.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Milvus

104 26,857 10.0 Go

A cloud-native vector database, storage for next generation AI applications

If you're just starting out, I'd use sentence-transformers for calculating embeddings. You'll want a bi-encoder model since they produce embeddings. As the author of the blog, I'm partial towards Milvus (https://github.com/milvus-io/milvus) due to its enterprise and scale, but FAISS is a great option too if you're just looking for something more local and contained.
Milvus will perform vector search for you - all you need to do is give it a query vector.

chroma

32 12,189 9.7 Python

the AI-native open-source embedding database

ah sorry, i should read OP better - chroma's default embedding model is sentence transformers - and we many other integrated - https://github.com/chroma-core/chroma/blob/main/chromadb/uti...
> It would be wonderful if there were a simpler (single file, SQLite or DuckDB like) database for vectors than the complex (and in some cases, unfortunately cloud-based) ones available now.
This is literally chroma!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project