approximate-nearest-neighbor-search

Open-source projects categorized as approximate-nearest-neighbor-search

Top 20 approximate-nearest-neighbor-search Open-Source Projects

  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

  • Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03

    I'm currently looking to implement locally, using QDrant [1] for instance.

    I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].

    [1]. https://qdrant.tech/

  • annoy

    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

  • Project mention: Do we think about vector dbs wrong? | news.ycombinator.com | 2023-09-05

    The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.

    Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.

    Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.

    In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.

    On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.

    Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Weaviate

    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

  • Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
  • pgvector

    Open-source vector similarity search for Postgres

  • Project mention: Vector Database solutions on AWS | dev.to | 2024-03-28

    When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.

  • SPTAG

    A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

  • hora

    🚀 efficient approximate nearest neighbor search algorithm collections library written in Rust 🦀 .

  • Project mention: Building a Vector Database with Rust to Make Use of Vector Embeddings | /r/rust | 2023-06-01

    We have been playing around with Hora as a replacement for the Rust-CV implementation as we want PQ as well. I'll check out instanct-distance, looks very interesting!

  • usearch

    Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

  • Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • vald

    Vald. A Highly Scalable Distributed Vector Search Engine

  • Project mention: What is the reason for using go mod replace like this? | /r/golang | 2023-04-24
  • pynndescent

    A Python nearest neighbor descent for approximate nearest neighbors

  • Project mention: [D]: Best nearest neighbour search for high dimensions | /r/MachineLearning | 2023-05-17

    I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.

  • pecos

    PECOS - Prediction for Enormous and Correlated Spaces

  • big-ann-benchmarks

    Framework for evaluating ANNS algorithms on billion scale datasets.

  • Project mention: Practical Vector Search: NeurIPS 2023 Competition Leaderboard | news.ycombinator.com | 2024-03-01
  • pgANN

    Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

  • Project mention: Pinecone raises $100M Series B | news.ycombinator.com | 2023-04-27

    Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.

    https://github.com/netrasys/pgANN

  • instant-distance

    Fast approximate nearest neighbor searching in Rust, based on HNSW index

  • Project mention: Show HN: A fast HNSW implementation in Rust | news.ycombinator.com | 2024-03-14
  • arroy

    Annoy-inspired Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage :boom:

  • Project mention: Unveiling arroy: Meilisearch's Latest ANNs Innovation with Rust and LMDB – A Nod to Spotify's Anno | dev.to | 2023-12-01

    For more information and advanced usage, refer to the official Arroy documentation.

  • awesome-vector-database

    A curated list of awesome works related to high dimensional structure/vector search & database

  • Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07
  • citrus

    (distributed) vector database (by 0xDebabrata)

  • Project mention: Created a smol vector database in my free time. Looking to provide a LangChain integration soon! | /r/LangChain | 2023-05-06

    It supports all the basic features like creating an index, inserting vectors and searching through them. Here's the GitHub link if anyone's interested in going over it: https://github.com/0xDebabrata/citrus

  • horapy

    🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library

  • hora-wasm

    webassembly binding for Hora Approximate Nearest Neighbor Search Library

  • alvd

    alvd = A Lightweight Vald. A lightweight distributed vector search engine works without K8s.

  • Project mention: Ask HN: Who is hiring? (September 2023) | news.ycombinator.com | 2023-09-01

    - vector search, utilizing TileDB and TileDB Cloud for seamless scaling: https://tiledb.com/blog/why-tiledb-as-a-vector-database (library: https://github.com/TileDB-Inc/TileDB-Vector-Search)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-03.

approximate-nearest-neighbor-search related posts

Index

What are some of the best open-source approximate-nearest-neighbor-search projects? This list will help you:

Project Stars
1 qdrant 17,718
2 annoy 12,662
3 Weaviate 9,436
4 pgvector 9,067
5 SPTAG 4,693
6 hora 2,552
7 usearch 1,611
8 vald 1,453
9 pynndescent 837
10 pecos 489
11 big-ann-benchmarks 291
12 pgANN 289
13 instant-distance 281
14 arroy 171
15 awesome-vector-database 127
16 citrus 92
17 horapy 69
18 hora-wasm 51
19 alvd 50
20 TileDB-Vector-Search 44

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com