uda
squirrel-datasets-core
uda | squirrel-datasets-core | |
---|---|---|
2 | 2 | |
2,153 | 43 | |
0.0% | - | |
0.0 | 2.3 | |
over 2 years ago | 8 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
uda
-
BERT models: how resilient are they to typos?
Another thought is to do some data augmentation using back-translation, a la https://arxiv.org/abs/1904.12848
-
A Visual Survey of Data Augmentation in NLP
The words that replaces the original word are chosen by calculating TF-IDF scores of words over the whole document and taking the lowest ones. You can refer to the code implementation for this in the original paper here.
squirrel-datasets-core
-
[P] Squirrel: A new OS library for fast & flexible large-scale data loading
Have a look at this tutorial to learn how to convert to messagepack by using Spark.
What are some alternatives?
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
squirrel-core - A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
SSL4MIS - Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
datasaurus - Do computer vision with 1000x less data
nlpaug - Data augmentation for NLP
podium - Podium: a framework agnostic Python NLP library for data loading and preprocessing
clip-as-service - 🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
contractions - Fixes contractions such as `you're` to `you are`
bert - TensorFlow code and pre-trained models for BERT