uda
Unsupervised Data Augmentation (UDA) (by google-research)
contractions
Fixes contractions such as `you're` to `you are` (by kootenpv)
uda | contractions | |
---|---|---|
2 | 2 | |
2,153 | 300 | |
0.0% | - | |
0.0 | 0.0 | |
over 2 years ago | over 1 year ago | |
Python | Python | |
Apache License 2.0 | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
uda
Posts with mentions or reviews of uda.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2020-08-26.
-
BERT models: how resilient are they to typos?
Another thought is to do some data augmentation using back-translation, a la https://arxiv.org/abs/1904.12848
-
A Visual Survey of Data Augmentation in NLP
The words that replaces the original word are chosen by calculating TF-IDF scores of words over the whole document and taking the lowest ones. You can refer to the code implementation for this in the original paper here.
contractions
Posts with mentions or reviews of contractions.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2020-08-26.
-
Tool for normalizing abbreviations?
Do you mean abbreviations or contractions? For contractions, there is this library.
-
A Visual Survey of Data Augmentation in NLP
In the paper, he gives an example of transforming verbal forms from contraction to expansion and vice versa. We can generate augmented texts by applying this. Since the transformation should not change the meaning of the sentence, we can see that this can fail in case of expanding ambiguous verbal forms like: To resolve this, the paper proposes that we allow ambiguous contractions but skip ambiguous expansion. You can find a list of contractions for the English language here. For expansion, you can use the contractions library in Python.
What are some alternatives?
When comparing uda and contractions you can also consider the following projects:
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
nlpaug - Data augmentation for NLP
SSL4MIS - Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
clip-as-service - 🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
squirrel-datasets-core - Squirrel dataset hub
bert - TensorFlow code and pre-trained models for BERT
squirrel-core - A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut: