squirrel-datasets-core
uda
squirrel-datasets-core | uda | |
---|---|---|
2 | 2 | |
43 | 2,153 | |
- | 0.0% | |
2.3 | 0.0 | |
8 months ago | over 2 years ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
squirrel-datasets-core
-
[P] Squirrel: A new OS library for fast & flexible large-scale data loading
Have a look at this tutorial to learn how to convert to messagepack by using Spark.
uda
-
BERT models: how resilient are they to typos?
Another thought is to do some data augmentation using back-translation, a la https://arxiv.org/abs/1904.12848
-
A Visual Survey of Data Augmentation in NLP
The words that replaces the original word are chosen by calculating TF-IDF scores of words over the whole document and taking the lowest ones. You can refer to the code implementation for this in the original paper here.
What are some alternatives?
squirrel-core - A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
datasaurus - Do computer vision with 1000x less data
SSL4MIS - Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
podium - Podium: a framework agnostic Python NLP library for data loading and preprocessing
nlpaug - Data augmentation for NLP
clip-as-service - 🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
contractions - Fixes contractions such as `you're` to `you are`
bert - TensorFlow code and pre-trained models for BERT