Np-sims Alternatives
Similar projects and alternatives to np-sims
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
-
fast_vector_similarity
The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
voyager
🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability. (by spotify)
np-sims reviews and mentions
-
Approximate Nearest Neighbors Oh Yeah
I implemented this recently in C as a numpy extension[1], for. fun. Even had a vectorized solution going.
You'll get diminishing returns on recall pretty fast. There's actually a theorem that tracks this - Jordan-Lindenstrauss lemma[2] if you're interested.
As I mention in a talk I gave[3], it can work if you're going to rerank anyway. And whatever vector search thing isn't the main ranking signal. It's also easy to update, as the hashes are non-parametric (they don't depend on the data).
The lack of data-dependency, however is the main problem. Vector spaces are lumpy. You can see this in the distribution of human beings on the surface of the earth - postal codes and area codes vary from small to huge - random hashes, like a grid, wouldn't let you accurately map out the distribution of all the people or clump them close to their actual nearest neighbors. Manhattan is not rural Alaska.
Annoy, actually, builds on these hashes, by creating many trees of such hashes, and then finds a split in the left and right. Then in creates a forest of such trees. So its essentially a forest of random hash trees with data dependency.
Hope that helps.
1 - https://github.com/softwaredoug/np-sims
-
Show HN: Fast Vector Similarity Using Rust and Python
Nice!
I recently implemented a C-based numpy solution of LSH to compress / recover cosine similarity[1]. It was my first time writing Numpy C, and it was a lot of fun to massively improve the performance over pure Python[2].
1- https://github.com/softwaredoug/np-sims
2- https://softwaredoug.com/blog/2023/08/22/rand-projections-in...
Stats
softwaredoug/np-sims is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of np-sims is Python.
Popular Comparisons
Sponsored