Introduction to Vector Similarity Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • https://github.com/facebookresearch/faiss

  • pgvector

    Open-source vector similarity search for Postgres

  • https://github.com/pgvector/pgvector

    `ankane/pgvector` docker image is a drop in replacement for the postgres image, so you can fire this up with docker very quickly.

    It's a normal postgres db with a vector datatype. It can index the vectors and allows efficient retrieval. Both AWS RDS and Google Cloud now support this in their managed Postgres offerings, so postgres+pgvector is a viable managed production vectordb solution.

    > Also, how granular should the text chunks be?

    That depends on the use case, the size of your corpus, the context of the model you are using, how much money you are willing to spend.

    > Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.

    Definitely. We use postgres+pgvector with php.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Milvus

    A cloud-native vector database, storage for next generation AI applications

  • If you're just starting out, I'd use sentence-transformers for calculating embeddings. You'll want a bi-encoder model since they produce embeddings. As the author of the blog, I'm partial towards Milvus (https://github.com/milvus-io/milvus) due to its enterprise and scale, but FAISS is a great option too if you're just looking for something more local and contained.

    Milvus will perform vector search for you - all you need to do is give it a query vector.

  • chroma

    the AI-native open-source embedding database

  • ah sorry, i should read OP better - chroma's default embedding model is sentence transformers - and we many other integrated - https://github.com/chroma-core/chroma/blob/main/chromadb/uti...

    > It would be wonderful if there were a simpler (single file, SQLite or DuckDB like) database for vectors than the complex (and in some cases, unfortunately cloud-based) ones available now.

    This is literally chroma!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts