Pinecone: Rust -- A hard decision pays off

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

qdrant

140 17,839 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Vector similarity search seems like a killer app for rust. You basically need people familiar with the machine learning ecosystem to write low level code. And either you can get the best C++ developers who can handle all of your concurrency thorns, or you can teach python developers rust which guarantees they won’t shoot themselves (and your clients) in the foot. One reason I was hesitant to use pinecone in the past for our production needs was such a heavy reliance on python. Now I will take another look. (Also looking at qdrant

pgvector

78 9,211 9.9 C

Open-source vector similarity search for Postgres

Vector similarity search benefits greatly from in memory representation. Because you’re dealing with fixed array sizes, you can embarrassingly parallelise querying the vectors. This also makes it amenable to GPU computation. I’m aware of a Postgres extension but it doesn’t by default load data into memory. In my quick investigations I’ve never seen how you could get equivalent performance with persistence. The in memory models allow millisecond queries even without Approximate Nearest Neighbour (ANN) indices. When I tested a simple query of about 100000 rows in Postgres using a custom function it was something like 50 seconds for a table scan (just my sketchy memory. Not a benchmark). With an in memory vector db it’s about 10ms. In both cases ANN indices improve performance but unlike traditional DB indices these have an accuracy performance tradeoff.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project