Vector database built for scalable similarity search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Milvus

    A cloud-native vector database, storage for next generation AI applications

    Milvus is completely open source (https://github.com/milvus-io/milvus) and supports a variety of index types (https://milvus.io/docs/overview.md#Index-types) and support various consistency levels, scalar/metadata filtering, and time travel. We started working on Milvus back in 2018, with 2.0 being released in January 2022 (https://github.com/milvus-io/milvus/releases/tag/v2.0.0).

    For those interested, here's a comparison with other open source vector databases: https://zilliz.com/comparison. For those who don't want to be burdened with installing and maintaining a local database, there's a managed service available as well: https://zilliz.com/cloud.

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    As another commenter noted, Milvus is overkill and a "bit much" if you're learning/playing.

    A good intro to the field with progression towards a full Milvus implementation could be starting with towhee[0] (which is also supported by Milvus).

    towhee has an example to do exactly what you want with CLIP[1].

    [0] - https://towhee.io/

    [1] - https://github.com/towhee-io/examples/tree/main/image/text_i...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • typesense-instantsearch-semantic-search-demo

    A demo that shows how to build a semantic search experience with Typesense's vector search feature and Instantsearch.js

    We added HNSW-based vector search to Typesense as well recently: https://typesense.org/docs/0.24.0/api/vector-search.html

    So you can combine attribute-based filters along with nearest-neighbor search.

    Put together this semantic search + filtering demo just last week: https://github.com/typesense/typesense-instantsearch-semanti...

  • We added HNSW-based vector search to Typesense as well recently: https://typesense.org/docs/0.24.0/api/vector-search.html

    So you can combine attribute-based filters along with nearest-neighbor search.

    Put together this semantic search + filtering demo just last week: https://github.com/typesense/typesense-instantsearch-semanti...

  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

  • pgvector

    Open-source vector similarity search for Postgres

    I really don't want another database. I just want to have a solution built in for Postgres, and more specific RDS which we use. I know there will be some extra difficulty that I will have to manage (e.g. reindexing to a new model that is outputting different embeddings), but I really don't want another piece of infrastructure.

    If anyone from AWS/Google/Azure is listening, please add pgvector [1] into your managed Postgres offerings!

    1. https://github.com/pgvector/pgvector

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • sqlite-vss

    A SQLite extension for efficient vector search, based on Faiss!

    That is great! I'll keep an eye.

    I've been playing with this extension: https://github.com/asg017/sqlite-vss

  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

    tbh. Looks like a huge overengineered legacy project. What is the clue to having all these ANN indexes in place? Is it a kinda art collection? What is the sense when you can just have HNSW in memory, with quantization, or on disk, GPU accelerated, etc. There are already better alternatives like Qdrant, which is written in Rust and super performant https://github.com/qdrant/qdrant, or Weaviate with GraphQL interface https://github.com/weaviate/weaviate

  • Weaviate

    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

    tbh. Looks like a huge overengineered legacy project. What is the clue to having all these ANN indexes in place? Is it a kinda art collection? What is the sense when you can just have HNSW in memory, with quantization, or on disk, GPU accelerated, etc. There are already better alternatives like Qdrant, which is written in Rust and super performant https://github.com/qdrant/qdrant, or Weaviate with GraphQL interface https://github.com/weaviate/weaviate

  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

    Didn't even realise Milvus was so lacking. https://github.com/marqo-ai/marqo also has a hybrid approach. It's just a more complete/end-to-end platform than pinecone, so it really just depends on what you're building

  • autofaiss

    Automatically create Faiss knn indices with the most optimal similarity search parameters.

    Don't start with mullivus if you're learning. Too much yak shaving. Try https://github.com/criteo/autofaiss.

    Also, TBH, it is a lot cheaper to run a simple faiss index.

  • vespa

    AI + Data, online. https://vespa.ai

    If ES doesn't work for you, I recommend Vespa. https://github.com/vespa-engine/vespa

    Others have made other suggestions, but Vespa has two unique features. First it is battle tested at a large scale, second it supports combining the keyword and vector scores in several ways. The latter is something that other hybrid systems don't do very well in my experience including ES/Solr.

  • examples

    Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc. (by towhee-io)

    As another commenter noted, Milvus is overkill and a "bit much" if you're learning/playing.

    A good intro to the field with progression towards a full Milvus implementation could be starting with towhee[0] (which is also supported by Milvus).

    towhee has an example to do exactly what you want with CLIP[1].

    [0] - https://towhee.io/

    [1] - https://github.com/towhee-io/examples/tree/main/image/text_i...

  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    txtai combines SQLite and Faiss to enable vector search. It also does a lot more than that.

    https://github.com/neuml/txtai

  • milvus-lite

    A lightweight version of Milvus wrapped with Python.

    Don't start with Milvus clustered version, not unless you have like 100million vectors.

    Try Milvus standalone instead, much simpler. I also just found their python version (https://github.com/milvus-io/embd-milvus), which is quite neat.

  • node-redis

    Redis Node.js client

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts