wit
trankit
wit | trankit | |
---|---|---|
5 | 1 | |
957 | 707 | |
1.1% | - | |
5.3 | 5.7 | |
6 months ago | 20 days ago | |
Python | ||
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wit
-
[R] Cross-lingual Wikipedia dataset
There's the Wikipedia Image Text dataset, which has many languages (including English and simple English) aswell as a TF datasets wrapper. https://github.com/google-research-datasets/wit
-
[R] Google AI Introduces ‘WIT’, A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning
Code for https://arxiv.org/abs/2103.01913 found: https://github.com/google-research-datasets/wit
-
Google AI Introduces ‘WIT’, A Wikipedia-Based Image Text Dataset For Multimodal Multilingual Machine Learning
To overcome these limitations, Google research team created a high-quality, large-sized, multilingual dataset called the Wikipedia-Based Image Text (WIT) Dataset. It is created by extracting multiple text selections associated with an image from Wikipedia articles and Wikimedia image links.
-
Hacker News top posts: Mar 4, 2021
Wit: Wikipedia-Based Image Text Dataset\ (0 comments)
- Wit: Wikipedia-Based Image Text Dataset
trankit
-
Trankit v1.0.0 - An open-source Transformer-based Multilingual NLP Toolkit for 56 languages is out.
Trankit is written in Python and can be easily installed via pip. Our code and pretrained models are publicly available at: https://github.com/nlp-uoregon/trankit
What are some alternatives?
lion - Where Lions Roam: RISC-V on the VELDT
spaCy - 💫 Industrial-strength Natural Language Processing (NLP) in Python
witokit - A Python toolkit to generate a tokenized dump of Wikipedia for NLP
Stanza - Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
WhereIsAI - AI company, product, and tool collection.
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
courses - This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
argilla - Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
cbonsai
wiktextract - Wiktionary dump file parser and multilingual data extractor
flair - A very simple framework for state-of-the-art Natural Language Processing (NLP)
Sentimentanalysis - Language independent sentiment analysis