np-sims VS fast_vector_similarity

Compare np-sims vs fast_vector_similarity and see what are their differences.

np-sims

numpy ufuncs for vector similarity (by softwaredoug)

fast_vector_similarity

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors. (by Dicklesworthstone)
Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
np-sims fast_vector_similarity
2 7
14 325
- -
8.6 7.2
6 months ago 10 months ago
Python Rust
MIT License -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

np-sims

Posts with mentions or reviews of np-sims. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-10-30.
  • Approximate Nearest Neighbors Oh Yeah
    5 projects | news.ycombinator.com | 30 Oct 2023
    I implemented this recently in C as a numpy extension[1], for. fun. Even had a vectorized solution going.

    You'll get diminishing returns on recall pretty fast. There's actually a theorem that tracks this - Jordan-Lindenstrauss lemma[2] if you're interested.

    As I mention in a talk I gave[3], it can work if you're going to rerank anyway. And whatever vector search thing isn't the main ranking signal. It's also easy to update, as the hashes are non-parametric (they don't depend on the data).

    The lack of data-dependency, however is the main problem. Vector spaces are lumpy. You can see this in the distribution of human beings on the surface of the earth - postal codes and area codes vary from small to huge - random hashes, like a grid, wouldn't let you accurately map out the distribution of all the people or clump them close to their actual nearest neighbors. Manhattan is not rural Alaska.

    Annoy, actually, builds on these hashes, by creating many trees of such hashes, and then finds a split in the left and right. Then in creates a forest of such trees. So its essentially a forest of random hash trees with data dependency.

    Hope that helps.

    1 - https://github.com/softwaredoug/np-sims

  • Show HN: Fast Vector Similarity Using Rust and Python
    8 projects | news.ycombinator.com | 23 Aug 2023
    Nice!

    I recently implemented a C-based numpy solution of LSH to compress / recover cosine similarity[1]. It was my first time writing Numpy C, and it was a lot of fun to massively improve the performance over pure Python[2].

    1- https://github.com/softwaredoug/np-sims

    2- https://softwaredoug.com/blog/2023/08/22/rand-projections-in...

fast_vector_similarity

Posts with mentions or reviews of fast_vector_similarity. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-07.
  • SentenceTransformers: Python framework for sentence, text and image embeddings
    2 projects | news.ycombinator.com | 7 Apr 2024
    Yes, check out my library for vector similarity that has various other measures which are more discriminative:

    https://github.com/Dicklesworthstone/fast_vector_similarity

    pip install fast_vector_similarity

  • Show HN: Neum AI – Open-source large-scale RAG framework
    3 projects | news.ycombinator.com | 21 Nov 2023
    Got it. I'd encourage you to expose more of that functionality at the level of your application if possible. I think there is a lot of potential in using more than just cosine similarity, especially when there are lots of candidates and you really want to sharpen up the top few recommendations to the best ones. You might find this open-source library I made recently useful for that:

    https://github.com/Dicklesworthstone/fast_vector_similarity

    I've had good results from starting with cosine similarity (using FAISS) and then "enriching" the top results from that with more sophisticated measures of similarity from my library to get the final ranking.

  • Some Reasons to Avoid Cython
    5 projects | news.ycombinator.com | 22 Sep 2023
    You can see how I did something similar in my library here:

    https://github.com/Dicklesworthstone/fast_vector_similarity/...

    Basically you use ndarray instead of numpy, try to vectorize anything you can, and for the for loops that can’t be vectorized, you can use rayon to do them in parallel.

  • FLaNK Stack Weekly 28 August 2023
    27 projects | dev.to | 28 Aug 2023
  • Fast Vector Similarity Library, Useful for Working With Llama2 Embedding Vectors
    1 project | /r/LocalLLaMA | 25 Aug 2023
  • Show HN: Fast Vector Similarity Using Rust and Python
    8 projects | news.ycombinator.com | 23 Aug 2023
    Yeah, like the other commenter said, everything is in this file here:

    https://github.com/Dicklesworthstone/fast_vector_similarity/...

    If you also make your project using Rust and Maturin, you can literally just copy and paste that into your project because it's totally generic, and if the repo is public, GitHub will just run it all for you for free.

    The only thing is you need to create an account on PyPi (pip) and add 2-Factor Auth so you can generate an API key. Then you go into the repo settings and go to secrets, and create a Github Actions secret with the name PYPI_API_TOKEN and make the value your PyPi token. That's it! It will not only compile all the wheels for you but even upload the project to PyPi for you using the settings found in your pyproject.toml file, like this:

    https://github.com/Dicklesworthstone/fast_vector_similarity/...

What are some alternatives?

When comparing np-sims and fast_vector_similarity you can also consider the following projects:

swiss_army_llama - A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

simsimd

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

QTVR - Tools for QTVR 1 files

DoctorGPT - πŸ’»πŸ“šπŸ’‘ DoctorGPT provides advanced LLM prompting for PDFs and webpages.

llama_embeddings_fastap

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured