GAug
AAAI'21: Data Augmentation for Graph Neural Networks (by zhao-tong)
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch. (by webdataset)
GAug | webdataset | |
---|---|---|
1 | 7 | |
181 | 1,962 | |
- | 3.3% | |
0.0 | 8.8 | |
8 days ago | 19 days ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
GAug
Posts with mentions or reviews of GAug.
We have used some of these posts to build our list of alternatives
and similar projects.
-
[D] What are some graph data augmentations?
Code for https://arxiv.org/abs/2006.06830 found: https://github.com/GAugAuthors/GAug
webdataset
Posts with mentions or reviews of webdataset.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-10-01.
-
How to use data stored in a (private) S3 Bucket for training?
As an alternative, I've looked into using WebDataset, but couldn't figure out how to access data that is stored in a private bucket.
- [D] Title: Best tools and frameworks for working with million-billion image datasets?
-
[D] Training networks on extremely large datasets (10+TB)?
You can try webdataset (https://github.com/webdataset/webdataset).
- Question: TIFF image dataset - size in RAM.
-
How to upload large amounts of data to a server?
compress it to .tar format and then load it as a webdataset
-
Does mit 6.824 help for distributed deep learning?
Would guess not but there should be some good niche resources: check out the introductory videos here https://github.com/webdataset/webdataset
-
How to effectively load a large text dataset with PyTorch?
I found a pretty good solution that is similar to the TFRecord from Tensorflow. You just need to load the data, tokenized it, and save the arrays in shards with webdataset package.
What are some alternatives?
When comparing GAug and webdataset you can also consider the following projects:
Practical_RL - A course in reinforcement learning in the wild
NYU-DLSP20 - NYU Deep Learning Spring 2020
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
ffcv - FFCV: Fast Forward Computer Vision (and other ML workloads!)
fastai - The fastai deep learning library
ModelNet40-C - Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
PySyft - Perform data science on data that remains in someone else's server
TTS - :robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)