Vector databases: analyzing the trade-offs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Adding txtai to the list: https://github.com/neuml/txtai

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

    Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

    pg_vector doesn't perform well compared to other methods, at least according to ANN-Benchmarks (https://ann-benchmarks.com/).

    txtai is more than just a vector database. It also has a built-in graph component for topic modeling that utilizes the vector index to autogenerate relationships. It can store metadata in SQLite/DuckDB with support for other databases coming. It has support for running LLM prompts right with the data, similar to a stored procedure, through workflows. And it has built-in support for vectorizing data into vectors.

    For vector databases that simply store vectors, I agree that it's nothing more than just a different index type.

  • milvus-lite

    A lightweight version of Milvus

    Shameless self-plug for our embedded vector database milvus-lite (https://github.com/milvus-io/milvus-lite):

        pip install milvus

  • Typesense

    Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

    I work on Typesense [1] (historically considered an open source alternative to Algolia).

    We then launched vector search in Jan 2023, and just last week we launched the ability to generate embeddings from within Typesense.

    You'd just need to send JSON data, and Typesense can generate embeddings for your data using OpenAI, PaLM API, or built-in models like S-BERT, E-5, etc (running on a GPU if you prefer) [2]

    You can then do a hybrid (keyword + semantic) search by just sending the search keywords to Typesense, and Typesense will automatically generate embeddings for you internally and return a ranked list of keyword results weaved with semantic results (using Rank Fusion).

    You can also combine filtering, faceting, typo tolerance, etc - the things Typesense already had.

    [1] https://github.com/typesense/typesense

    [2] https://typesense.org/docs/0.25.0/api/vector-search.html

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • OpenSearch – open-source search and analytics based on Apache 2.0 Elasticsearch

    5 projects | news.ycombinator.com | 5 Mar 2022
  • YaCy, a distributed Web Search Engine, based on a peer-to-peer network

    9 projects | news.ycombinator.com | 5 Mar 2024
  • What is Hybrid Search?

    6 projects | dev.to | 6 Feb 2024
  • RAG Using Unstructured Data and Role of Knowledge Graphs

    4 projects | news.ycombinator.com | 17 Jan 2024
  • Challenging projects every programmer should try

    8 projects | news.ycombinator.com | 25 Dec 2023

Did you konow that Python is
the 1st most popular programming language
based on number of metions?