Nearest-neighbor search in high-dimensional spaces

This page summarizes the projects mentioned and recommended in the original post on /r/compsci

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • google-research

    Google Research

  • Don't roll your own solution, use ScaNN (https://github.com/google-research/google-research/tree/master/scann) or Faiss (https://github.com/facebookresearch/faiss). I used the internal version of ScaNN while I was at Google, and found it incredibly well put-together. Can't speak to the open-source version, but it should be similarly good. These might be a bit overkill given your set sizes, but it'll be easier than building your own fix.

  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • Don't roll your own solution, use ScaNN (https://github.com/google-research/google-research/tree/master/scann) or Faiss (https://github.com/facebookresearch/faiss). I used the internal version of ScaNN while I was at Google, and found it incredibly well put-together. Can't speak to the open-source version, but it should be similarly good. These might be a bit overkill given your set sizes, but it'll be easier than building your own fix.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts