webdataset
ffcv
Our great sponsors
webdataset | ffcv | |
---|---|---|
7 | 8 | |
1,962 | 2,742 | |
7.4% | 1.2% | |
8.8 | 3.5 | |
17 days ago | 6 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
webdataset
-
How to use data stored in a (private) S3 Bucket for training?
As an alternative, I've looked into using WebDataset, but couldn't figure out how to access data that is stored in a private bucket.
- [D] Title: Best tools and frameworks for working with million-billion image datasets?
-
[D] Training networks on extremely large datasets (10+TB)?
You can try webdataset (https://github.com/webdataset/webdataset).
- Question: TIFF image dataset - size in RAM.
-
How to upload large amounts of data to a server?
compress it to .tar format and then load it as a webdataset
-
Does mit 6.824 help for distributed deep learning?
Would guess not but there should be some good niche resources: check out the introductory videos here https://github.com/webdataset/webdataset
-
How to effectively load a large text dataset with PyTorch?
I found a pretty good solution that is similar to the TFRecord from Tensorflow. You just need to load the data, tokenized it, and save the arrays in shards with webdataset package.
ffcv
- Question: TIFF image dataset - size in RAM.
-
[P] Composer: a new PyTorch library to train models ~2-4x faster with better algorithms
PyTorch Lightning is also very slow compared to Composer. You don't have to believe us: our friends who wrote the FFCV library benchmarked us against PTL (see the lower left plot in the first cluster of graphs) , and you can see the difference for yourself. For the same accuracy, the FFCV folks found that Composer is about 5x faster than PTL on ResNet-50 on ImageNet.
- FFCV: Fast Forward Computer Vision
-
Does anyone know where I can find research papers for preprocessing large image datasets?
maybe something like this? https://github.com/libffcv/ffcv
- Ffcv: Train models at a fraction of the cost with accelerated data loading
- Show HN: FFCV – Accelerated machine learning via fast data loading
-
[P] FFCV: Accelerated Model Training via Fast Data Loading
Hi! You can join the slack directly from the link on the homepage! (ffcv.io)
What are some alternatives?
Practical_RL - A course in reinforcement learning in the wild
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
NYU-DLSP20 - NYU Deep Learning Spring 2020
best-of-ml-python - 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
composer - Supercharge Your Model Training
fastai - The fastai deep learning library
apex - A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
ModelNet40-C - Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
array_storage_benchmark - Compare some methods of array storage in Python (numpy)
PySyft - Perform data science on data that remains in someone else's server
ffcv-imagenet - Train ImageNet *fast* in 500 lines of code with FFCV