LASER vs fast_vector_similarity

LASER

Language-Agnostic SEntence Representations (by facebookresearch)

fast_vector_similarity

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors. (by Dicklesworthstone)

Suggest topics

Source Code

pypi.org

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

LASER		fast_vector_similarity
	Project
5	Mentions	7
3,520	Stars	324
0.3%	Growth	-
5.7	Activity	7.2
7 days ago	Latest Commit	8 months ago
Jupyter Notebook	Language	Rust
GNU General Public License v3.0 or later	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

LASER

Posts with mentions or reviews of LASER. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-07.

SentenceTransformers: Python framework for sentence, text and image embeddings
2 projects | news.ycombinator.com | 7 Apr 2024

I'm curious how people are handling multi-lingual embeddings.
I've found LASER[1] which originally had the idea to embed all languages in the same vector space, though it's a bit harder to use than models available through SentenceTransformers. LASER2 stuck with this approach, but LASER3 switched to language-specific models. However, I haven't found benchmarks for these models, and they were released about 2 years ago.
Another alternative would be to translate everything before embedding, which would introduce some amount of error, though maybe it wouldn't be significant.
1. https://github.com/facebookresearch/LASER
[D] Hey Reddit! We're a bunch of research scientists and software engineers and we just open sourced a new state-of-the-art AI model that can translate between 200 different languages. We're excited to hear your thoughts so we're hosting an AMA on 07/21/2022 @ 9:00AM PT. Ask Us Anything!
10 projects | /r/MachineLearning | 21 Jul 2022

You can check out some of our materials and open sourced artifacts here: - Our latest blog post: https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation - Project Overview: https://ai.facebook.com/research/no-language-left-behind/ - Product demo: https://nllb.metademolab.com/ - Research paper: https://research.facebook.com/publications/no-language-left-behind - NLLB-200: https://github.com/facebookresearch/fairseq/tree/nllb - FLORES-200: https://github.com/facebookresearch/flores - LASER3: https://github.com/facebookresearch/LASER Joining us today for the AMA are: - Angela Fan (AF), Research Scientist - Jean Maillard (JM), Research Scientist - Maha Elbayad (ME), Research Scientist - Philipp Koehn (PK), Research Scientist - Shruti Bhosale (SB), Software Engineer We’ll be here from 07/21/2022 @09:00AM PT - 10:00AM PT Thanks and we’re looking forward to answering your questions!
School project : sentiments analysis with my country Arabic Dialect
1 project | /r/datascience | 3 Nov 2021

This may be helpful: https://github.com/facebookresearch/LASER
[P] Bilingual text alignment tools for NMT - help needed
2 projects | /r/MachineLearning | 4 Oct 2021

Check FB's LASER: https://github.com/facebookresearch/LASER/tree/master/tasks/CCMatrix Also , Sentence-Transformers has a pretty neat model for crosslingual sentence similarity: https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual
Help with aligned word embeddings
3 projects | /r/LanguageTechnology | 4 May 2021

You want LASER its a superbig model trained on tons of languages you can use it with sentence_transformers in python to compute embedings. Then you can use faiss or datasketch to find matches at K

fast_vector_similarity

Posts with mentions or reviews of fast_vector_similarity. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-07.

SentenceTransformers: Python framework for sentence, text and image embeddings
2 projects | news.ycombinator.com | 7 Apr 2024

Yes, check out my library for vector similarity that has various other measures which are more discriminative:
https://github.com/Dicklesworthstone/fast_vector_similarity
pip install fast_vector_similarity
Show HN: Neum AI – Open-source large-scale RAG framework
3 projects | news.ycombinator.com | 21 Nov 2023

Got it. I'd encourage you to expose more of that functionality at the level of your application if possible. I think there is a lot of potential in using more than just cosine similarity, especially when there are lots of candidates and you really want to sharpen up the top few recommendations to the best ones. You might find this open-source library I made recently useful for that:
https://github.com/Dicklesworthstone/fast_vector_similarity
I've had good results from starting with cosine similarity (using FAISS) and then "enriching" the top results from that with more sophisticated measures of similarity from my library to get the final ranking.
Some Reasons to Avoid Cython
5 projects | news.ycombinator.com | 22 Sep 2023

You can see how I did something similar in my library here:
https://github.com/Dicklesworthstone/fast_vector_similarity/...
Basically you use ndarray instead of numpy, try to vectorize anything you can, and for the for loops that can’t be vectorized, you can use rayon to do them in parallel.
FLaNK Stack Weekly 28 August 2023
27 projects | dev.to | 28 Aug 2023
Fast Vector Similarity Library, Useful for Working With Llama2 Embedding Vectors
1 project | /r/LocalLLaMA | 25 Aug 2023
Show HN: Fast Vector Similarity Using Rust and Python
8 projects | news.ycombinator.com | 23 Aug 2023

Yeah, like the other commenter said, everything is in this file here:
https://github.com/Dicklesworthstone/fast_vector_similarity/...
If you also make your project using Rust and Maturin, you can literally just copy and paste that into your project because it's totally generic, and if the repo is public, GitHub will just run it all for you for free.
The only thing is you need to create an account on PyPi (pip) and add 2-Factor Auth so you can generate an API key. Then you go into the repo settings and go to secrets, and create a Github Actions secret with the name PYPI_API_TOKEN and make the value your PyPi token. That's it! It will not only compile all the wheels for you but even upload the project to PyPi for you using the settings found in your pyproject.toml file, like this:
https://github.com/Dicklesworthstone/fast_vector_similarity/...

What are some alternatives?

When comparing LASER and fast_vector_similarity you can also consider the following projects:

MUSE - A library for Multilingual Unsupervised or Supervised word Embeddings

simsimd

electra - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

swiss_army_llama - A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends

np-sims - numpy ufuncs for vector similarity

fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

QTVR - Tools for QTVR 1 files

flores - Facebook Low Resource (FLoRes) MT Benchmark

llama_embeddings_fastap

DoctorGPT - 💻📚💡 DoctorGPT provides advanced LLM prompting for PDFs and webpages.

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

LASER vs MUSE fast_vector_similarity vs simsimd LASER vs electra fast_vector_similarity vs swiss_army_llama LASER vs Arraymancer fast_vector_similarity vs np-sims LASER vs fairseq fast_vector_similarity vs QTVR LASER vs flores fast_vector_similarity vs llama_embeddings_fastap fast_vector_similarity vs DoctorGPT fast_vector_similarity vs qdrant

Compare LASER vs fast_vector_similarity and see what are their differences.

LASER

fast_vector_similarity

LASER

fast_vector_similarity

What are some alternatives?