Our great sponsors
-
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I found a pretty good solution that is similar to the TFRecord from Tensorflow. You just need to load the data, tokenized it, and save the arrays in shards with webdataset package.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- How to use data stored in a (private) S3 Bucket for training?
- [D] Title: Best tools and frameworks for working with million-billion image datasets?
- [D] Training networks on extremely large datasets (10+TB)?
- How to upload large amounts of data to a server?
- Does mit 6.824 help for distributed deep learning?