You Shouldn't Invest in Vector Databases?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

txtai

355 6,990 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

You can try txtai (https://github.com/neuml/txtai) with a Faiss backend.
This Faiss wiki article might help (https://github.com/facebookresearch/faiss/wiki/Indexing-1G-v...).
For example, a partial Faiss configuration with 4-bit PQ quantization and only using 5% of the data to train an IVF index is shown below.
faiss={"components": "IVF,PQ384x4fs", "sample": 0.05}

faiss

70 28,202 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

You can try txtai (https://github.com/neuml/txtai) with a Faiss backend.
This Faiss wiki article might help (https://github.com/facebookresearch/faiss/wiki/Indexing-1G-v...).
For example, a partial Faiss configuration with 4-bit PQ quantization and only using 5% of the data to train an IVF index is shown below.
faiss={"components": "IVF,PQ384x4fs", "sample": 0.05}

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pgvecto.rs

17 1,375 9.3 Rust

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.

It's kind of a tradeoff. Performance is just one factor when choosing the vector database. In pgvecto.rs https://github.com/tensorchord/pgvecto.rs, we store the index separately from PostgreSQL's internal storage, unlike pgvector's approach. This enable us to get multi-threaded indexing, async indexing without blocking the insertion, and faster search speed comparing to pgvector.
I don't see any fundamental reason why the index in Postgres would be slower than a specialized vector database. The query pattern of the vector database is simply a point query using an index, similar to other queries in an OLTP system.
The only limitation I see is scalability. It's not easy to make PostgreSQL distributed, but solutions like Citus exist, making it still possible.
(I'm the author of pgvecto.rs)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project