Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →
Top 23 Python data-augmentation Projects
-
Project mention: Harnessing Weak Supervision to Isolate Sign Language in Crowded News Videos | news.ycombinator.com | 2024-08-15
Hello everyone, we are trying to make a large dataset for Sign Language translation, inspired by BSL-1K [1]. As part of cleaning our collected videos, we use a nice technique for aggregating heuristic labels [2]. We thought it was interesting enough to share with people on here.
[1] https://www.robots.ox.ac.uk/~vgg/research/bsl1k/
[2] https://github.com/snorkel-team/snorkel
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
TextAttack
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
-
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
-
-
fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
-
-
inltk
Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need
-
Nutrient
Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
-
synthcity
A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
-
-
ModelNet40-C
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
-
-
-
ContraD
Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)
-
-
-
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models (by searchableai)
-
Project mention: Show HN: Augini, an AI-Powered Tabular Data Assistant | news.ycombinator.com | 2024-12-25
-
-
targetran
Python library for data augmentation in object detection or image classification model training
-
-
fastaugment
A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.
-
MTR
The official implementation of the paper "Rethinking Data Augmentation for Tabular Data in Deep Learning" (by somaonishi)
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python data-augmentation discussion
Python data-augmentation related posts
-
How to use data stored in a (private) S3 Bucket for training?
-
[D] Title: Best tools and frameworks for working with million-billion image datasets?
-
[D] Training networks on extremely large datasets (10+TB)?
-
Best language model for filling multiple related masks [D]
-
Hi everyone, my first Reddit post, let me introduce the GENIUS model.
-
[D] Efficiently loading videos in PyTorch without extracting frames
-
New image augmentation library for TF Dataset + TPU
-
A note from our sponsor - CodeRabbit
coderabbit.ai | 17 Feb 2025
Index
What are some of the best open-source data-augmentation projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | snorkel | 5,828 |
2 | TextAttack | 3,060 |
3 | webdataset | 2,457 |
4 | torchio | 2,125 |
5 | fastdup | 1,651 |
6 | eda_nlp | 1,612 |
7 | inltk | 826 |
8 | synthcity | 515 |
9 | image_augmentor | 450 |
10 | ModelNet40-C | 216 |
11 | question_extractor | 206 |
12 | GAug | 189 |
13 | ContraD | 188 |
14 | genius | 175 |
15 | mutate | 151 |
16 | KitanaQA | 57 |
17 | augini | 26 |
18 | vkit | 22 |
19 | targetran | 20 |
20 | degradr | 19 |
21 | fastaugment | 15 |
22 | MTR | 14 |
23 | shutter | 2 |