citrus
pynndescent
citrus | pynndescent | |
---|---|---|
1 | 4 | |
93 | 841 | |
- | - | |
7.6 | 6.3 | |
about 1 month ago | about 1 month ago | |
Python | Python | |
Apache License 2.0 | BSD 2-clause "Simplified" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
citrus
-
Created a smol vector database in my free time. Looking to provide a LangChain integration soon!
It supports all the basic features like creating an index, inserting vectors and searching through them. Here's the GitHub link if anyone's interested in going over it: https://github.com/0xDebabrata/citrus
pynndescent
-
[D]: Best nearest neighbour search for high dimensions
I'll assume this is the link to pynndescent, looks cool! Thanks for sharing. I haven't used it before. Also seems like it's an approximate nearest neighbor algorithm, just FYI for others seeing this.
-
How to find "k" nearest embeddings in a space with a very large number of N embeddings (efficiently)?
If you just want quick in memory search then pynndescent is a decent option: it's easy to install, and easy to get running. Another good option is Annoy; it's just as easy to install and get running with python, but it is a little less performant if you want to do a lot of queries, or get a knn-graph quickly.
-
PynnDescent: Importing pickled index gives error - 'NNDescent' object has no attribute 'shape'
Using the latest version of PyNNDescent via pip install. Running this on Google Colab with python 3.7.13Followed the Docs and created an index with the paramspynnindex = pynndescent.NNDescent(arr, metric="cosine", n_neighbors=100)Everything works fine and I get results from pynnindex.neighbor_graph as expected.
-
[D] In UMAP and PyNNDescent, the conversion of Cosine and Correlation measures to distance metric seems problematic
PyNNDescent distances.py: pynndescent/distances.py at master ยท lmcinnes/pynndescent (github.com)
What are some alternatives?
pgANN - Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
umap - Uniform Manifold Approximation and Projection
vector-db-benchmark - Framework for benchmarking vector search engines
annoy - Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
vector-search-compilation - A compilation of Vector Search Databases
ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python
vald - Vald. A Highly Scalable Distributed Vector Search Engine
faiss - A library for efficient similarity search and clustering of dense vectors.
awesome-vector-database - A curated list of awesome works related to high dimensional structure/vector search & database
qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Milvus - A cloud-native vector database, storage for next generation AI applications