ffcv
webdataset
ffcv | webdataset | |
---|---|---|
8 | 7 | |
2,747 | 1,981 | |
0.8% | 4.2% | |
3.5 | 8.8 | |
13 days ago | 24 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
ffcv
- Question: TIFF image dataset - size in RAM.
-
[P] Composer: a new PyTorch library to train models ~2-4x faster with better algorithms
PyTorch Lightning is also very slow compared to Composer. You don't have to believe us: our friends who wrote the FFCV library benchmarked us against PTL (see the lower left plot in the first cluster of graphs) , and you can see the difference for yourself. For the same accuracy, the FFCV folks found that Composer is about 5x faster than PTL on ResNet-50 on ImageNet.
- FFCV: Fast Forward Computer Vision
-
Does anyone know where I can find research papers for preprocessing large image datasets?
maybe something like this? https://github.com/libffcv/ffcv
- Ffcv: Train models at a fraction of the cost with accelerated data loading
- Show HN: FFCV – Accelerated machine learning via fast data loading
-
[P] FFCV: Accelerated Model Training via Fast Data Loading
Hi! You can join the slack directly from the link on the homepage! (ffcv.io)
webdataset
-
How to use data stored in a (private) S3 Bucket for training?
As an alternative, I've looked into using WebDataset, but couldn't figure out how to access data that is stored in a private bucket.
- [D] Title: Best tools and frameworks for working with million-billion image datasets?
-
[D] Training networks on extremely large datasets (10+TB)?
You can try webdataset (https://github.com/webdataset/webdataset).
- Question: TIFF image dataset - size in RAM.
-
How to upload large amounts of data to a server?
compress it to .tar format and then load it as a webdataset
-
Does mit 6.824 help for distributed deep learning?
Would guess not but there should be some good niche resources: check out the introductory videos here https://github.com/webdataset/webdataset
-
How to effectively load a large text dataset with PyTorch?
I found a pretty good solution that is similar to the TFRecord from Tensorflow. You just need to load the data, tokenized it, and save the arrays in shards with webdataset package.
What are some alternatives?
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
Practical_RL - A course in reinforcement learning in the wild
best-of-ml-python - 🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
NYU-DLSP20 - NYU Deep Learning Spring 2020
composer - Supercharge Your Model Training
Made-With-ML - Learn how to design, develop, deploy and iterate on production-grade ML applications.
apex - A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
fastai - The fastai deep learning library
array_storage_benchmark - Compare some methods of array storage in Python (numpy)
ModelNet40-C - Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
ffcv-imagenet - Train ImageNet *fast* in 500 lines of code with FFCV
PySyft - Perform data science on data that remains in someone else's server