nearest-neighbor-search

Top 23 nearest-neighbor-search Open-Source Projects

  • Milvus

    A cloud-native vector database, storage for next generation AI applications

    Project mention: Ask HN: Who is hiring? (April 2024) | news.ycombinator.com | 2024-04-01

    Zilliz (zilliz.com) | Hybrid/ONSITE (SF, NYC) | Full-time

    I am part of the hiring team for DevRel

    NYC - https://boards.greenhouse.io/zilliz/jobs/4307910005

    SF - https://boards.greenhouse.io/zilliz/jobs/4317590005

    Zilliz is the company behind Milvus (https://github.com/milvus-io/milvus), the most starred vector database on GitHub. Milvus is a distributed vector database that shines in 1B+ vector use cases. Examples include autonomous driving, e-commerce, and drug discovery. (and, of course, RAG)

    We are also hiring for other roles that I am not personally involved in the hiring process for such as product managers, software engineers, and recruiters.

  • qdrant

    Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

    Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03

    I'm currently looking to implement locally, using QDrant [1] for instance.

    I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].

    [1]. https://qdrant.tech/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • annoy

    Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

    Project mention: Do we think about vector dbs wrong? | news.ycombinator.com | 2023-09-05

    The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.

    Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.

    Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.

    In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.

    On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.

    Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database

  • Weaviate

    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

    Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
  • pgvector

    Open-source vector similarity search for Postgres

    Project mention: Vector Database solutions on AWS | dev.to | 2024-03-28

    When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.

  • Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: The Current State of Clojure's Machine Learning Ecosystem | news.ycombinator.com | 2024-04-07

    > I don't think it's right to recommend that new users move away from the package because of licensing issues

    I was going to chime in to agree but then I saw how this was done - a completely innocuous looking commit:

    https://github.com/haifengl/smile/commit/6f22097b233a3436519...

    And literally no mention in the release notes:

    https://github.com/haifengl/smile/releases/tag/v3.0.0

    I think if you are going to change license especially in a way that makes it less permissive you need to be super open and clear about both the fact you are doing it and your reasons for that. This is done so silently as to look like it is intentionally trying to mislead and trick people.

    So maybe I wouldn't say to move away because of the specific license, but it's legitimate to avoid something when it's so clearly driven by a single entity and that entity acts in a way that isn't trustworthy.

  • mlpack

    mlpack: a fast, header-only C++ machine learning library

    Project mention: How much C++ is used when it comes to performing quant research? | /r/quant | 2023-07-03

    Does C++ have the equivalent of Pandas or Apache Spark? Are there extensive libraries that exist/are being developed that allow you to perform operations with data? Or do people just use a combination of Python & its various libraries (NumPy etc)? If we leave aside the data bit, are there libraries that allow you to develop ML models in C++ (mlpack for instance ) faster & more efficiently compared to their Python counterparts (scikit-learn)? On a more general note, how does C++ fit into the routine of a Quant Researcher? And at what scale does an organization decide they need to start switching to other languages and spend more time developing the code ?

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • docarray

    Represent, send, store and search multimodal data

    Project mention: DocArray – Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27
  • usearch

    Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

    Project mention: USearch SQLite Extensions for Vector and Text Search | news.ycombinator.com | 2024-02-22
  • vald

    Vald. A Highly Scalable Distributed Vector Search Engine

    Project mention: What is the reason for using go mod replace like this? | /r/golang | 2023-04-24
  • pgvecto.rs

    Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.

    Project mention: My binary vector search is better than your FP32 vectors | dev.to | 2024-03-25

    To evaluate the performance metrics in comparison to the original vector approach, we conducted benchmarking using the dbpedia-entities-openai3-text-embedding-3-large-3072-1M dataset. The benchmark was performed on a Google Cloud virtual machine (VM) with specifications of n2-standard-8, which includes 8 virtual CPUs and 32GB of memory. We used pgvecto.rs v0.2.1 as the vector database.

  • voyager

    🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability. (by spotify)

    Project mention: FLaNK Stack for 04 December 2023 | dev.to | 2023-12-04
  • similarity

    TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

  • soundfingerprinting

    Open source audio fingerprinting in .NET. An efficient algorithm for acoustic fingerprinting written purely in C#.

    Project mention: Ask HN: How many of you are self employed? | news.ycombinator.com | 2024-02-05

    Started 10 years ago as an open-source project, building an algorithm for audio fingerprinting. Added a commercial offering, selling storage built specifically for audio fingerprints, targeting enterprise customers. Since the offering was too technical (it's hard to sell solutions to problems that are too narrow and domain-specific), pivoted to more "business-oriented problems". This last year's pivot is a chance to finally grow. Running a business in single-player mode is, at times, too stressful. Aside from the technical part, which I very much enjoy, I need to wear marketing, sales, and customer support hats.

    [1] - https://emysound.com

  • pynndescent

    A Python nearest neighbor descent for approximate nearest neighbors

    Project mention: [D]: Best nearest neighbour search for high dimensions | /r/MachineLearning | 2023-05-17

    I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.

  • voy

    🕸️🦀 A WASM vector similarity search written in Rust

    Project mention: Ask HN: Semantic Vector Searching in WASM? | news.ycombinator.com | 2024-01-03

    Would this[1] library help you? It's a Rust vector similarity search engine, written to be compiled to Wasm. I discovered it through articles like these[2].

        [1] https://github.com/tantaraio/voy

  • quaterion

    Blazing fast framework for fine-tuning similarity learning models

  • neighbor

    Nearest neighbor search for Rails and Postgres

  • elastiknn

    Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.

  • pgANN

    Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.

    Project mention: Pinecone raises $100M Series B | news.ycombinator.com | 2023-04-27

    Why do you use pgvector instead of pgANN? My understanding is pgANN is built with FAISS. When I compared pgvector with FAISS, pgvector was 3-5x slower.

    https://github.com/netrasys/pgANN

  • TorchPQ

    Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda

  • awesome-vector-database

    A curated list of awesome works related to high dimensional structure/vector search & database

    Project mention: Show HN: SimSIMD vs. SciPy: How AVX-512 and SVE make SIMD cleaner and ML faster | news.ycombinator.com | 2023-10-07
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-07.

nearest-neighbor-search related posts

Index

What are some of the best open-source nearest-neighbor-search projects? This list will help you:

Project Stars
1 Milvus 26,490
2 qdrant 17,718
3 annoy 12,662
4 Weaviate 9,359
5 pgvector 8,904
6 Smile 5,914
7 mlpack 4,787
8 docarray 2,730
9 usearch 1,611
10 vald 1,451
11 pgvecto.rs 1,364
12 awesome-vector-search 1,257
13 voyager 1,142
14 similarity 994
15 soundfingerprinting 902
16 pynndescent 837
17 voy 696
18 quaterion 619
19 neighbor 417
20 elastiknn 352
21 pgANN 289
22 TorchPQ 201
23 awesome-vector-database 127
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com