Introduction to Vector Similarity Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • faiss

    A library for efficient similarity search and clustering of dense vectors.

    https://github.com/facebookresearch/faiss

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • pgvector

    Open-source vector similarity search for Postgres

    https://github.com/pgvector/pgvector

    `ankane/pgvector` docker image is a drop in replacement for the postgres image, so you can fire this up with docker very quickly.

    It's a normal postgres db with a vector datatype. It can index the vectors and allows efficient retrieval. Both AWS RDS and Google Cloud now support this in their managed Postgres offerings, so postgres+pgvector is a viable managed production vectordb solution.

    > Also, how granular should the text chunks be?

    That depends on the use case, the size of your corpus, the context of the model you are using, how much money you are willing to spend.

    > Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.

    Definitely. We use postgres+pgvector with php.

  • Milvus

    A cloud-native vector database, storage for next generation AI applications

    If you're just starting out, I'd use sentence-transformers for calculating embeddings. You'll want a bi-encoder model since they produce embeddings. As the author of the blog, I'm partial towards Milvus (https://github.com/milvus-io/milvus) due to its enterprise and scale, but FAISS is a great option too if you're just looking for something more local and contained.

    Milvus will perform vector search for you - all you need to do is give it a query vector.

  • chroma

    the AI-native open-source embedding database

    ah sorry, i should read OP better - chroma's default embedding model is sentence transformers - and we many other integrated - https://github.com/chroma-core/chroma/blob/main/chromadb/uti...

    > It would be wonderful if there were a simpler (single file, SQLite or DuckDB like) database for vectors than the complex (and in some cases, unfortunately cloud-based) ones available now.

    This is literally chroma!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Simplifying the Milvus Selection Process

    3 projects | dev.to | 19 Feb 2024
  • Milvus Adventures Dec 15, 2023

    1 project | dev.to | 15 Dec 2023
  • GPU-Accelerated Indexing in LanceDB

    1 project | news.ycombinator.com | 3 Nov 2023
  • Code Search with Vector Embeddings: A Transformer's Approach

    3 projects | dev.to | 27 Aug 2023
  • Implementing Vector Database for AI

    1 project | dev.to | 23 Aug 2023