Pinecone: Rust -- A hard decision pays off

This page summarizes the projects mentioned and recommended in the original post on /r/rust

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

  • Vector similarity search seems like a killer app for rust. You basically need people familiar with the machine learning ecosystem to write low level code. And either you can get the best C++ developers who can handle all of your concurrency thorns, or you can teach python developers rust which guarantees they won’t shoot themselves (and your clients) in the foot. One reason I was hesitant to use pinecone in the past for our production needs was such a heavy reliance on python. Now I will take another look. (Also looking at qdrant

  • pgvector

    Open-source vector similarity search for Postgres

  • Vector similarity search benefits greatly from in memory representation. Because you’re dealing with fixed array sizes, you can embarrassingly parallelise querying the vectors. This also makes it amenable to GPU computation. I’m aware of a Postgres extension but it doesn’t by default load data into memory. In my quick investigations I’ve never seen how you could get equivalent performance with persistence. The in memory models allow millisecond queries even without Approximate Nearest Neighbour (ANN) indices. When I tested a simple query of about 100000 rows in Postgres using a custom function it was something like 50 seconds for a table scan (just my sketchy memory. Not a benchmark). With an in memory vector db it’s about 10ms. In both cases ANN indices improve performance but unlike traditional DB indices these have an accuracy performance tradeoff.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts