Top 7 Python distributed-training Projects
-
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
Project mention: Hi, just wanted to know any trusted source to get pre-trained weights of various models(resnet18, resnet34, ViT, SwinT, etc) for datasets like CIFAR10/CIFAR100/STL10/COCO etc | reddit.com/r/learnmachinelearning | 2022-05-05Also, if you guys could share inference code for the Timm library(https://github.com/rwightman/pytorch-image-models) on CIFAR10/100, STL10 that would be awesome.
-
Project mention: Train 18-billion-parameter GPT models with a single GPU on your personal computer! Open source project Colossal-AI has added new features! | reddit.com/r/ArtificialInteligence | 2022-05-16
Check out the project over here: https://github.com/hpcaitech/ColossalAI
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
Project mention: How to train large deep learning models as a startup | news.ycombinator.com | 2021-10-07
Check out Determined https://github.com/determined-ai/determined to help manage this kind of work at scale: Determined leverages Horovod under the hood, automatically manages cloud resources and can get you up on spot instances, T4's, etc. and will work on your local cluster as well. Gives you additional features like experiment management, scheduling, profiling, model registry, advanced hyperparameter tuning, etc.
Full disclosure: I'm a founder of the project.
-
hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
The problem is that, currently, large ML models need to be trained on clusters of tightly-connected GPUs/accelerators. So it's kinda useless having a bunch of GPUs spread all over the world with huge latency and low bandwidth between them. That may change though - there are people working on it: https://github.com/learning-at-home/hivemind
-
AdaptDL
-
GitHub code: https://github.com/alpa-projects/alpa
-
HandyRL
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Project mention: Suggestions for board game reinforcement learning methods, frameworks | reddit.com/r/reinforcementlearning | 2022-03-24 -
SonarLint
Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.
Python distributed-training related posts
- Train 18-billion-parameter GPT models with a single GPU on your personal computer! Open source project Colossal-AI has added new features!
- [P] Scalable PaLM implementation of PyTorch
- Colossal-AI: A Unified Deep Learning System for Large-Scale Training
- [P]Change few lines of code, enjoy your high tea, and you've got a pre-training ViT-B-32 :P
- Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
- [P]Change few lines of codes, and then accelerate AI model training by 10x
- Now you can train ViT in half an hour!
Index
What are some of the best open-source distributed-training projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pytorch-image-models | 18,509 |
2 | ColossalAI | 3,508 |
3 | determined | 1,701 |
4 | hivemind | 1,022 |
5 | adaptdl | 294 |
6 | alpa | 264 |
7 | HandyRL | 225 |
Are you hiring? Post a new remote job listing for free.