Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
DiskANN
Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
There is certainly a wide variety of problems today for which pgvector is unsuitable due to performance limitations... but fear not! This is an area that is getting significant focus right now.
A marqo.ai dev is currently working on adding HNSW-IVF and HNSW support to PGVector https://news.ycombinator.com/item?id=35551684 and the maintainer has recently noted that they are actively working on an IVFPQ/ScaNN implementation https://github.com/pgvector/pgvector/issues/93
The pgAnn creator actually asked about performance a month ago here https://github.com/pgvector/pgvector/issues/58
Expect to see performance improve dramatically later this year.
Thought this was about https://github.com/matrix-org/pinecone
Spot on. There is zero moat and the self-hosted alternatives are rapidly improving (if not better) than Pinecone. There are good open-source contributions coming from bigcorp beyond Meta too, e.g., DiskANN (https://github.com/microsoft/DiskANN).
Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.
https://github.com/netrasys/pgANN