finalfusion-rust
magnitude
finalfusion-rust | magnitude | |
---|---|---|
1 | 5 | |
88 | 1,612 | |
- | 0.1% | |
6.3 | 0.0 | |
7 months ago | 10 months ago | |
Rust | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
finalfusion-rust
-
Compressing high-dimensional vectors by 97%
Nice article that explains product quantization very well!
PQ is really a nice compression technique. I implemented PQ and Optimized PQ [1] a while back in our word embedding package for Rust:
https://github.com/finalfusion/finalfusion-rust/
https://github.com/finalfusion/reductive/
Particularly Optimized PQ was effective in reducing vector sizes ~10 times with virtually no reconstruction loss. This made it much easier to ship models (no more 3GB embedding matrix with a neural net that is just a few megabytes large).
[1] http://kaiminghe.com/publications/pami13opq.pdf
magnitude
-
Text Classification Library for a Quick Baseline
(3) FastText now supports multiple languages [2].
[1] https://github.com/plasticityai/magnitude#pre-converted-magn...
-
Pgvector – vector similarity search for Postgres
Check out Magnitude, we built it to solve that problem: https://github.com/plasticityai/magnitude
It's still loaded from a file, but heavily uses memory-mapping and caching to be speedy and not overload your RAM immediately. And in production scenarios, multiple worker processes can share that memory due to the memory mapping.
Disclaimer: I'm the author.
-
Build an Embeddings index from a data source
General language models from pymagnitude
-
Tutorial series on txtai
Backed by the pymagnitude library. Pre-trained word vectors can be installed from the referenced link.
What are some alternatives?
excalidraw-animate - A tool to animate Excalidraw drawings
flashtext - Extract Keywords from sentence or Replace keywords in sentences.
excalidraw - Virtual whiteboard for sketching hand-drawn like diagrams
faiss - A library for efficient similarity search and clustering of dense vectors.
cleora - Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.
pgvector - Open-source vector similarity search for Postgres
Milvus - A cloud-native vector database, storage for next generation AI applications
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Resume-Matcher - Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.
pretty-print-confusion-matrix - Confusion Matrix in Python: plot a pretty confusion matrix (like Matlab) in python using seaborn and matplotlib
Romanian-Word-Embeddings - Romanian Word Embeddings. Here you can find pre-trained corpora of word embeddings. Current methods: CBOW, Skip-Gram, Fast-Text (from Gensim library). The .vec and .model files are available for download (all in one archive).
sentence-transformers - Multilingual Sentence & Image Embeddings with BERT