Solr’s Dense Vector Search for indexing and searching dense numerical vectors

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Milvus

    A cloud-native vector database, storage for next generation AI applications

  • There's no one answer to this, but I'd say that anything past 10k vectors would benefit greatly from a vector database. A vector DB will abstract away the building of a vector index along with other core database features such as caching, failover, replication, horizontal scaling, etc. Milvus (https://milvus.io) is open-source and always my go-to choice for this (disclaimer: I'm a part of the Milvus community). An added bonus of Milvus is that it supports GPU-accelerated indexing and search.

    All of this assumes you're okay with a bit of imprecision - vector search with modern indexes is inherently probabilistic, e.g. your recall may not be 100%, but it will be close. Using a flat indexing strategy is still an option, but you lose a lot of the speedup that comes with a vector database.

  • annoy

    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

  • It was to expect after recent ES releases. However, dedicated vector search engines offer better performance and more advanced features. Qdrant https://github.com/qdrant/qdrant is written in Rust. Fast, stable, and super easy to deploy. (disclaimer. affiliated with the project).

  • neural-solr

    Neural Solr = Solr 9 + Mighty Inference + Node

  • Dense vector search in Solr is a welcome addition, but getting started requires a lot of pieces that aren’t included.

    So I made this a couple months ago to make it super easy to get started with this tech. If you have a sitemap you can start the docker compose and index your website with one command line.

    https://github.com/maxdotio/neural-solr

    Enjoy!

  • bfes-java

    Brute force embedding search

  • I just tried this using my brute force embedding search library that runs on CPUs and it does it in 171s. How often do you need to do this?

    https://github.com/spullara/bfes-java

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • K-Nearest Neighbors

    2 projects | news.ycombinator.com | 13 Apr 2022
  • Show HN: Danswer – open-source question answering across all your docs

    7 projects | news.ycombinator.com | 10 Jul 2023
  • I've changed my mind about Code Interpretor

    3 projects | /r/ChatGPT | 9 Jul 2023
  • open-source google-like search for workplace knowledge

    7 projects | /r/programming | 19 Mar 2023
  • A Critical Field Guide for Working with Machine Learning Datasets

    3 projects | news.ycombinator.com | 17 Feb 2023