textaugment
magnitude
textaugment | magnitude | |
---|---|---|
2 | 5 | |
372 | 1,611 | |
1.3% | 0.1% | |
4.6 | 0.0 | |
2 months ago | 9 months ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
textaugment
-
NLP augmentation models
I just came across this Python library. It has a bunch of dictionary-, backtranslation- and knowledge-based heuristics that should work most of the time:
- Prefer volume or quality for BERT-based Text classification model
magnitude
-
Text Classification Library for a Quick Baseline
(3) FastText now supports multiple languages [2].
[1] https://github.com/plasticityai/magnitude#pre-converted-magn...
-
Pgvector – vector similarity search for Postgres
Check out Magnitude, we built it to solve that problem: https://github.com/plasticityai/magnitude
It's still loaded from a file, but heavily uses memory-mapping and caching to be speedy and not overload your RAM immediately. And in production scenarios, multiple worker processes can share that memory due to the memory mapping.
Disclaimer: I'm the author.
-
Build an Embeddings index from a data source
General language models from pymagnitude
-
Tutorial series on txtai
Backed by the pymagnitude library. Pre-trained word vectors can be installed from the referenced link.
What are some alternatives?
AugLy - A data augmentations library for audio, image, text, and video.
flashtext - Extract Keywords from sentence or Replace keywords in sentences.
word_forms - Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.
faiss - A library for efficient similarity search and clustering of dense vectors.
scattertext - Beautiful visualizations of how language differs among document types.
pgvector - Open-source vector similarity search for Postgres
wordnet - Stand-alone WordNet API
finalfusion-rust - finalfusion embeddings in Rust
tfops-aug - TFOps-Aug: Implementation of policy-based image augmentation techniques based on TF2 Operations. All augmentations as efficient Tensorflow 2.11.0 operations. Easy integration into a tf.data API pipeline.
Milvus - A cloud-native vector database, storage for next generation AI applications
txtai - 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Resume-Matcher - Resume Matcher is an open source, free tool to improve your resume. It works by using language models to compare and rank resumes with job descriptions.