Tatoeba-Challenge
AutomaticKeyphraseExtraction
Tatoeba-Challenge | AutomaticKeyphraseExtraction | |
---|---|---|
16 | 1 | |
781 | 336 | |
1.4% | - | |
5.3 | 10.0 | |
4 days ago | about 6 years ago | |
Makefile | ||
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Tatoeba-Challenge
-
OpenAI GPT-3 vs Other Models [Benchmark] - Should AI companies be really worried ?
Automatically translate a text from a language A to a language B. 1/ Dataset : we chose a dataset from the Language Technology Research Group at the University of Helsinki’s Tatoeba Translation Challenge . We took 100 of examples from different latin languages pairs : deu-fra, eng-fra, fra -ita, deu-spa , deu-swe which constitutes a 500 example test dataset.
-
Amazon releases 51-language dataset for language understanding
https://translatelocally.com/ is a nice gui around marian/bergamot. So far not very many bundled pairs, though I would guess any of the models from https://github.com/Helsinki-NLP/Opus-MT-train/tree/master/mo... and https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/maste... should be usable.
There is also Apertium, a rule-based system which is very good for some closely-related pairs that have had a lot of work put into them (especially translation between Romance languages, e.g. Spanish→Catalan, and Norwegian Bokmål→Nynorsk), and the only OK translator for some lesser-resourced languages (e.g. Northern Saami→Norwegian Bokmål), but very underdeveloped for anything to/from English (it feels a bit pointless writing rules for English where there is so much available data; RBMT shines where there's not enough available data, ie. most of the languages of the world)
-
[P] What we learned by accelerating by 5X Hugging Face generative language models
#1: University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages | 0 comments #2: The NLP Index: 3,000+ code repos for hackers and researchers. [self-promotion] #3: A Python library to boost T5 models speed up to 5x & reduce the model size by 3x.
-
Labelling of Text (NLP)
#1: Matching GPT-3's performance with just 0.1% of its parameters #2: University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages | 0 comments #3: Trained a Markov Chain on a bunch of r/WSB posts and comments. Only 2-word conditional probabilities but honestly, that's all that's necessary 🚀🚀
- Helsinki professor Jörg Tiedemann – 500M translations in 188 languages
- Thought it could be useful to someone
- University of Helsinki language technology professor Jörg Tiedemann has released a dataset with over 500 million translated sentences in 188 languages
- Translated language database released by Helsinki scientist
- 500 million sentences in 188 languages
AutomaticKeyphraseExtraction
-
OpenAI GPT-3 vs Other Models [Benchmark] - Should AI companies be really worried ?
Keyword or Keyphrase Extraction is about being able to extract the words or phrases that most represent a given text. 1/ Dataset: we selected our datasets from the public github repository AutomaticKeyphraseExtraction Most of the datasets listed there were too long for the 4k token limit of OpenAI so we had to go with the Hulth2003 abstracts dataset. Since the different providers are trained to return keywords and keyphrases present in the original text, we did some cleaning to remove all keywords that were not present in the abstracts. We ended up with 470 abstracts.
What are some alternatives?
OPUS-MT-train - Training open neural machine translation models
edenai-apis - Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
COMET - A Neural Framework for MT Evaluation
fastseq - An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/pdf/2106.04718.pdf