OPUS-MT-train
Tatoeba-Challenge
OPUS-MT-train | Tatoeba-Challenge | |
---|---|---|
1 | 16 | |
302 | 770 | |
3.0% | 1.0% | |
1.7 | 5.7 | |
about 2 months ago | 9 days ago | |
Makefile | Makefile | |
MIT License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OPUS-MT-train
-
Amazon releases 51-language dataset for language understanding
https://translatelocally.com/ is a nice gui around marian/bergamot. So far not very many bundled pairs, though I would guess any of the models from https://github.com/Helsinki-NLP/Opus-MT-train/tree/master/mo... and https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/maste... should be usable.
There is also Apertium, a rule-based system which is very good for some closely-related pairs that have had a lot of work put into them (especially translation between Romance languages, e.g. Spanish→Catalan, and Norwegian Bokmål→Nynorsk), and the only OK translator for some lesser-resourced languages (e.g. Northern Saami→Norwegian Bokmål), but very underdeveloped for anything to/from English (it feels a bit pointless writing rules for English where there is so much available data; RBMT shines where there's not enough available data, ie. most of the languages of the world)
Tatoeba-Challenge
-
OpenAI GPT-3 vs Other Models [Benchmark] - Should AI companies be really worried ?
Automatically translate a text from a language A to a language B. 1/ Dataset : we chose a dataset from the Language Technology Research Group at the University of Helsinki’s Tatoeba Translation Challenge . We took 100 of examples from different latin languages pairs : deu-fra, eng-fra, fra -ita, deu-spa , deu-swe which constitutes a 500 example test dataset.
-
Amazon releases 51-language dataset for language understanding
https://translatelocally.com/ is a nice gui around marian/bergamot. So far not very many bundled pairs, though I would guess any of the models from https://github.com/Helsinki-NLP/Opus-MT-train/tree/master/mo... and https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/maste... should be usable.
There is also Apertium, a rule-based system which is very good for some closely-related pairs that have had a lot of work put into them (especially translation between Romance languages, e.g. Spanish→Catalan, and Norwegian Bokmål→Nynorsk), and the only OK translator for some lesser-resourced languages (e.g. Northern Saami→Norwegian Bokmål), but very underdeveloped for anything to/from English (it feels a bit pointless writing rules for English where there is so much available data; RBMT shines where there's not enough available data, ie. most of the languages of the world)
-
[P] What we learned by accelerating by 5X Hugging Face generative language models
#1: University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages | 0 comments #2: The NLP Index: 3,000+ code repos for hackers and researchers. [self-promotion] #3: A Python library to boost T5 models speed up to 5x & reduce the model size by 3x.
-
Labelling of Text (NLP)
#1: Matching GPT-3's performance with just 0.1% of its parameters #2: University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages | 0 comments #3: Trained a Markov Chain on a bunch of r/WSB posts and comments. Only 2-word conditional probabilities but honestly, that's all that's necessary 🚀🚀
- Helsinki professor Jörg Tiedemann – 500M translations in 188 languages
- Thought it could be useful to someone
- University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages
- Translated language database released by Helsinki scientist
- 500 million sentences in 188 languages
What are some alternatives?
Opus-MT - Open neural machine translation models and web services
COMET - A Neural Framework for MT Evaluation
NLP-progress - Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
fastseq - An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pdf/2106.04718.pdf
tensor2tensor - Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
edenai-apis - Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
Face-Recognition_Flutter - A sample Face recognition app using Flutter and Firebase ML Kit
AutomaticKeyphraseExtraction - Data for Automatic Keyphrase Extraction Task
klpt - The Kurdish Language Processing Toolkit
deep-learning-drizzle - Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!