Python data-augmentation

Open-source Python projects categorized as data-augmentation

Top 23 Python data-augmentation Projects

data-augmentation
  1. snorkel

    A system for quickly generating training data with weak supervision

    Project mention: Harnessing Weak Supervision to Isolate Sign Language in Crowded News Videos | news.ycombinator.com | 2024-08-15

    Hello everyone, we are trying to make a large dataset for Sign Language translation, inspired by BSL-1K [1]. As part of cleaning our collected videos, we use a nice technique for aggregating heuristic labels [2]. We thought it was interesting enough to share with people on here.

    [1] https://www.robots.ox.ac.uk/~vgg/research/bsl1k/

    [2] https://github.com/snorkel-team/snorkel

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. TextAttack

    TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

  4. webdataset

    A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

  5. torchio

    Medical imaging processing for deep learning.

  6. fastdup

    fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

    Project mention: AIM Weekly 17 June 2024 | dev.to | 2024-06-17
  7. eda_nlp

    Data augmentation for NLP, presented at EMNLP 2019

  8. inltk

    Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

  9. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  10. synthcity

    A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

  11. image_augmentor

    Data augmentation tool for images

  12. ModelNet40-C

    Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

  13. question_extractor

    Generate question/answer training pairs out of raw text.

  14. GAug

    AAAI'21: Data Augmentation for Graph Neural Networks

  15. ContraD

    Code for the paper "Training GANs with Stronger Augmentations via Contrastive Discriminator" (ICLR 2021)

  16. genius

    💡GENIUS – generating text using sketches! A strong text generation & data augmentation tool.

  17. mutate

    A library to synthesize text datasets using Large Language Models (LLM)

  18. KitanaQA

    KitanaQA: Adversarial training and data augmentation for neural question-answering models (by searchableai)

  19. augini

    augini: AI-Powered Tabular Data Assistant

    Project mention: Show HN: Augini, an AI-Powered Tabular Data Assistant | news.ycombinator.com | 2024-12-25
  20. vkit

    Boosting Document Intelligence

  21. targetran

    Python library for data augmentation in object detection or image classification model training

  22. degradr

    Python library for realistically degrading images.

  23. fastaugment

    A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.

  24. MTR

    The official implementation of the paper "Rethinking Data Augmentation for Tabular Data in Deep Learning" (by somaonishi)

  25. shutter

    Stochastic image generator for annotated synthetic datasets (by Rainelz)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-augmentation discussion

Log in or Post with

Python data-augmentation related posts

  • How to use data stored in a (private) S3 Bucket for training?

    1 project | /r/pytorch | 21 Jul 2023
  • [D] Title: Best tools and frameworks for working with million-billion image datasets?

    1 project | /r/MachineLearning | 26 Mar 2023
  • [D] Training networks on extremely large datasets (10+TB)?

    1 project | /r/MachineLearning | 17 Feb 2023
  • Best language model for filling multiple related masks [D]

    1 project | /r/MachineLearning | 9 Jan 2023
  • Hi everyone, my first Reddit post, let me introduce the GENIUS model.

    2 projects | /r/deeplearning | 23 Nov 2022
  • [D] Efficiently loading videos in PyTorch without extracting frames

    5 projects | /r/MachineLearning | 26 Oct 2021
  • New image augmentation library for TF Dataset + TPU

    1 project | /r/tensorflow | 14 Sep 2021
  • A note from our sponsor - CodeRabbit
    coderabbit.ai | 17 Feb 2025
    Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR. Learn more →

Index

What are some of the best open-source data-augmentation projects in Python? This list will help you:

# Project Stars
1 snorkel 5,828
2 TextAttack 3,060
3 webdataset 2,457
4 torchio 2,125
5 fastdup 1,651
6 eda_nlp 1,612
7 inltk 826
8 synthcity 515
9 image_augmentor 450
10 ModelNet40-C 216
11 question_extractor 206
12 GAug 189
13 ContraD 188
14 genius 175
15 mutate 151
16 KitanaQA 57
17 augini 26
18 vkit 22
19 targetran 20
20 degradr 19
21 fastaugment 15
22 MTR 14
23 shutter 2

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai