OPUS-MT-train
NLP-progress
OPUS-MT-train | NLP-progress | |
---|---|---|
1 | 17 | |
304 | 22,328 | |
3.6% | - | |
1.7 | 2.1 | |
about 2 months ago | 12 days ago | |
Makefile | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
OPUS-MT-train
-
Amazon releases 51-language dataset for language understanding
https://translatelocally.com/ is a nice gui around marian/bergamot. So far not very many bundled pairs, though I would guess any of the models from https://github.com/Helsinki-NLP/Opus-MT-train/tree/master/mo... and https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/maste... should be usable.
There is also Apertium, a rule-based system which is very good for some closely-related pairs that have had a lot of work put into them (especially translation between Romance languages, e.g. Spanish→Catalan, and Norwegian Bokmål→Nynorsk), and the only OK translator for some lesser-resourced languages (e.g. Northern Saami→Norwegian Bokmål), but very underdeveloped for anything to/from English (it feels a bit pointless writing rules for English where there is so much available data; RBMT shines where there's not enough available data, ie. most of the languages of the world)
NLP-progress
- [Discussion] Checklist of seminal NLP papers
- NLP research status
-
[D] How difficult/easy is to learn NLP once you have experience in a CV?
One thing is that NLP is a set of wildly different problems which share some aspects, but often use quite different techniques and assumptions about their datasets. So even if you would have NLP experience, if you'd need to start on a substantially different NLP task, you can't just apply what you know and succeed, you have to review "how things are done" for that problem domain. For a quick overview, sites like https://nlpprogress.com/ can be helpful to see what methods are used; and, perhaps even more importantly, how people are modeling the actual task.
-
Upcoming App Announcement: Lemmatize, a Foreign Language Reader
A standard step in Chinese text processing is word segmentation, which deals with this problem.
-
Is there as site tracking computer vision process?
NLP has a github project tracking NLP progress, https://github.com/sebastianruder/NLP-progress. I wanna know if there is one tracking computer vision progress.
-
[P] NLP "tl;dr" Notes on Transformers
It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.
- What are state-of-the-art methods for abstractive text summarization ?
-
BreadPanes 81: "They/Them"
As I said It increase ambiguity and cognitive overheard, needlessly given that "it" exists. Moreover it also make it harder for artificial intelligence to understand human text https://github.com/sebastianruder/NLP-progress/blob/master/english/coreference_resolution.md
-
[Request] Curated Advanced NLP Resources
I could not find it on the internet (including on GitHub, Kaggle, Medium, or Reddit.) And, I know about NLP Progress and The Super Duper NLP Repo.
-
How do you guys find/ keep up to date with the latest NLP papers?
For someone who needs to be on top of the latest research - Twitter (distraction-prone, marketing-friendly, instantly-gratifying, quick), newsletters in ML + NLP (https://jack-clark.net/, ruder.io, offconvex.org, etc.) (distraction-free, generic, time-consuming), SOTA chasing (https://paperswithcode.com/, http://nlpprogress.com/) (distraction-free, generic + focused, code-friendly)
What are some alternatives?
Opus-MT - Open neural machine translation models and web services
nlp_tasks - Natural Language Processing Tasks and References
Tatoeba-Challenge
wtpsplit - Code for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation