Vector databases: analyzing the trade-offs

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

txtai

355 6,990 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Adding txtai to the list: https://github.com/neuml/txtai
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.

ann-benchmarks

51 4,588 8.1 Python

Benchmarks of approximate nearest neighbor libraries in Python

pg_vector doesn't perform well compared to other methods, at least according to ANN-Benchmarks (https://ann-benchmarks.com/).
txtai is more than just a vector database. It also has a built-in graph component for topic modeling that utilizes the vector index to autogenerate relationships. It can store metadata in SQLite/DuckDB with support for other databases coming. It has support for running LLM prompts right with the data, similar to a stored procedure, through workflows. And it has built-in support for vectorizing data into vectors.
For vector databases that simply store vectors, I agree that it's nothing more than just a different index type.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
milvus-lite

4 146 5.7 Python

A lightweight version of Milvus wrapped with Python.

Shameless self-plug for our embedded vector database milvus-lite (https://github.com/milvus-io/milvus-lite):
    pip install milvus

Typesense

129 17,876 9.8 C++

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

I work on Typesense [1] (historically considered an open source alternative to Algolia).
We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.
You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]
You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).
You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had.
[1] https://github.com/typesense/typesense
[2] https://typesense.org/docs/0.25.0/api/vector-search.html

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project