Opus-MT
fastText
Our great sponsors
Opus-MT | fastText | |
---|---|---|
3 | 8 | |
527 | 25,505 | |
8.7% | - | |
4.8 | 6.0 | |
4 days ago | about 2 months ago | |
Python | HTML | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Opus-MT
-
“sync,corrected by elderman” issue in ML translation datasets spread on internet
- mention on GitHub repo of a translation model https://github.com/Helsinki-NLP/Opus-MT/issues/62
I'm curious to see if anyone else has interesting encounters with this
-
How worried are you about AI taking over music?
Yes, most models these days, except the exceptionally large ones, are possible to train on a laptop. Of course it helps if your laptop has Nvidia CUDA GPU, but even if it doesn't you can rent an AWS 4 core/16GB GPU instance for 0.5 cents an hour. 24 hours of training time would be quite a lot for most models, unless you're trying to train a FB any to any language type model, but typically the big huge models are not the most interesting ones, and you can get very good results, and interesting models with substantially smaller sets of data. Opus MT models are only one language to one language, but they're about 300MB a model, and the quality rivals FB's models, and the speed is substantially faster. I don't have as many examples from the music space, as it's still a fairly under explored area, but Google has released Magenta which is a pretrained Tensorflow music model(actually a group of 3-4 models).
- Helsinki-NLP/Opus-MT: Open neural machine translation models and web services
fastText
- FastText Repo Archived
-
Pixelfed and Naive Bayes: The Grandfather of Spam Filters Still Making Waves
- trained with cross-entropy, meaning that model scores can be used more effectively as a 'confidence' - e.g. for spam if you want to say something like "if prediction score > X, then filter", Naive Bayes is not ideal due to the 'naive' assumption which makes the scores very un-calibrated (it tends to give extremely high or low confidence scores for most things).
disclaimer: I haven't really thought about NLP for about 3 years so there may be something better than this now
[1] https://github.com/facebookresearch/fastText
-
How worried are you about AI taking over music?
fasttext 50
- FLiP Stack Weekly for 06-Jan-2023
- Fasttext: Library for efficient text classification and representation learning
-
Reverse Language Reconstructing by Consensus [D] [P]
https://github.com/facebookresearch/fastText the readme may have what I need built in. But not sure. I hate ML documentation. I would love to see data input to data output examples because people expect us to understand their line of thought, and it just doesn't work out that way. This looks like what I need, but I've completely misinterpreted ML documentation many times. Ha
-
Virtual Sommelier, text classifier in the browser
To use the model trained with FastText from the browser, it is necessary to load it via WebAssembly. However, you don't require a WebAssembly knowledge as you can use the fasttext.js file which has all the glue code.
-
Synonyms.vim: feedback needed.
Having the backend code in the plugin repo, and in python held me off. I wrote it to split vimscript/python from the command that finds the info, as it allows to use powerful tools like fasttext rather than a dictionary.
What are some alternatives?
OPUS-MT-train - Training open neural machine translation models
synonyms.vim - Finding synonyms of words within vim, save time going back and forth to thesaurus.
OpenNMT-py - Open Source Neural Machine Translation and (Large) Language Models in PyTorch
talk - Group video call for the web. No signups. No downloads. [Moved to: https://github.com/vasanthv/tlk]
Neural-Machine-Translated-communication-system - The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.
TRIME - [EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Gauss - Stable Diffusion macOS native app
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
React - The library for web and native user interfaces.
klpt - The Kurdish Language Processing Toolkit
thesaurus_query.vim - Multi-language Thesaurus Query and Replacement plugin for Vim/NeoVim