Vectors are over, hashes are the future

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • ann-benchmarks

    Benchmarks of approximate nearest neighbor libraries in Python

  • Seems the author is proposing LSH instead of vectors for doing ANN?

    There are benchmarks here, http://ann-benchmarks.com/ , but LSH underperforms the state of the art ANN algorithms like HNSW on recall/throughput.

    LSH I believe was state of the art 10ish years ago, but has since been surpassed. Although the caching aspect is really nice.

  • Milvus

    A cloud-native vector database, storage for next generation AI applications

  • Hashes are great, but to say that "vectors are over" is just plain nonsense. We continue to see vectors as a core part of production systems for entity representation and recommendation (example: https://slack.engineering/recommend-api) and within models themselves (example: multimodal and diffusion models). For folks into metrics, we're building a vector database specifically for storing, indexing, and searching across massive quantities of vectors (https://github.com/milvus-io/milvus), and we've seen close to exponential growth in terms of total downloads.

    Vectors are just getting started.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • gaoya

    Locality Sensitive Hashing

  • https://github.com/serega/gaoya

    HNSW index is slow to construct, so it is best suited for search or recommendation engines where you build the index and serve. For workloads where you continuously mutate the index, like streaming clustering/deduplication LSH outperforms HNSW.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts