Vector databases: analyzing the trade-offs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Adding txtai to the list: https://github.com/neuml/txtai

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

    Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.

  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

  • pg_vector doesn't perform well compared to other methods, at least according to ANN-Benchmarks (https://ann-benchmarks.com/).

    txtai is more than just a vector database. It also has a built-in graph component for topic modeling that utilizes the vector index to autogenerate relationships. It can store metadata in SQLite/DuckDB with support for other databases coming. It has support for running LLM prompts right with the data, similar to a stored procedure, through workflows. And it has built-in support for vectorizing data into vectors.

    For vector databases that simply store vectors, I agree that it's nothing more than just a different index type.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • milvus-lite

    A lightweight version of Milvus wrapped with Python.

  • Shameless self-plug for our embedded vector database milvus-lite (https://github.com/milvus-io/milvus-lite):

        pip install milvus

  • Typesense

    Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

  • I work on Typesense [1] (historically considered an open source alternative to Algolia).

    We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.

    You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]

    You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).

    You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had.

    [1] https://github.com/typesense/typesense

    [2] https://typesense.org/docs/0.25.0/api/vector-search.html

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts