Our great sponsors
-
Yes! We are really motivated by translation as an actual technology that people need (actually, part of our work was interviewing many different native speakers of low-resource languages). As part of that, we do experiment with distillation. That's detailed in Section 8.6 of our paper: https://arxiv.org/pdf/2207.04672.pdf where we compare two different distillation approaches. We also describe how we used distillation to create models that are serving Wikipedia's Content Translation tool (which you can use to write new Wikipedia articles), and then distillation of the full NLLB-200 model. These distilled models are available for download on github: https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/modeling. For your question around productionization, we did partner with our production translation team to integrate the modeling techniques and learnings from the NLLB project into production translation. These are live on Facebook and Instagram today for some languages! [angela]
-
You can check out some of our materials and open sourced artifacts here: - Our latest blog post: https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation - Project Overview: https://ai.facebook.com/research/no-language-left-behind/ - Product demo: https://nllb.metademolab.com/ - Research paper: https://research.facebook.com/publications/no-language-left-behind - NLLB-200: https://github.com/facebookresearch/fairseq/tree/nllb - FLORES-200: https://github.com/facebookresearch/flores - LASER3: https://github.com/facebookresearch/LASER Joining us today for the AMA are: - Angela Fan (AF), Research Scientist - Jean Maillard (JM), Research Scientist - Maha Elbayad (ME), Research Scientist - Philipp Koehn (PK), Research Scientist - Shruti Bhosale (SB), Software Engineer We’ll be here from 07/21/2022 @09:00AM PT - 10:00AM PT Thanks and we’re looking forward to answering your questions!
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
You can check out some of our materials and open sourced artifacts here: - Our latest blog post: https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation - Project Overview: https://ai.facebook.com/research/no-language-left-behind/ - Product demo: https://nllb.metademolab.com/ - Research paper: https://research.facebook.com/publications/no-language-left-behind - NLLB-200: https://github.com/facebookresearch/fairseq/tree/nllb - FLORES-200: https://github.com/facebookresearch/flores - LASER3: https://github.com/facebookresearch/LASER Joining us today for the AMA are: - Angela Fan (AF), Research Scientist - Jean Maillard (JM), Research Scientist - Maha Elbayad (ME), Research Scientist - Philipp Koehn (PK), Research Scientist - Shruti Bhosale (SB), Software Engineer We’ll be here from 07/21/2022 @09:00AM PT - 10:00AM PT Thanks and we’re looking forward to answering your questions!
-
stopes
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
We have a bunch! The model and data are available here: https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/modeling , LASER3 here: https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/laser\_distillation , training data here: https://github.com/facebookresearch/fairseq/tree/nllb/examples/nllb/data , FLORES and our other human translated datasets here: https://github.com/facebookresearch/flores , and an entire modular pipeline for data cleaning here: https://github.com/facebookresearch/stopes. It's also available on HuggingFace! [angela]
Related posts
- Sequence-to-Sequence Toolkit Written in Python
- Show HN: LlamaGym – fine-tune LLM agents with online reinforcement learning
- Lightning AI Studios – A persistent GPU cloud environment
- Nvidia's 900 tons of GPU muscle bulks up server market, slims down wallets
- Talk back and forth with AI like you would with a person