Python distributed-training

Open-source Python projects categorized as distributed-training | Edit details

Top 7 Python distributed-training Projects

  • pytorch-image-models

    PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

    Project mention: Hi, just wanted to know any trusted source to get pre-trained weights of various models(resnet18, resnet34, ViT, SwinT, etc) for datasets like CIFAR10/CIFAR100/STL10/COCO etc | | 2022-05-05

    Also, if you guys could share inference code for the Timm library( on CIFAR10/100, STL10 that would be awesome.

  • ColossalAI

    Colossal-AI: A Unified Deep Learning System for Big Model Era

    Project mention: Train 18-billion-parameter GPT models with a single GPU on your personal computer! Open source project Colossal-AI has added new features! | | 2022-05-16

    Check out the project over here:

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • determined

    Determined: Deep Learning Training Platform

    Project mention: How to train large deep learning models as a startup | | 2021-10-07

    Check out Determined to help manage this kind of work at scale: Determined leverages Horovod under the hood, automatically manages cloud resources and can get you up on spot instances, T4's, etc. and will work on your local cluster as well. Gives you additional features like experiment management, scheduling, profiling, model registry, advanced hyperparameter tuning, etc.

    Full disclosure: I'm a founder of the project.

  • hivemind

    Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

    Project mention: GPT-3 Is No Longer the Only Game in Town | | 2021-11-07

    The problem is that, currently, large ML models need to be trained on clusters of tightly-connected GPUs/accelerators. So it's kinda useless having a bunch of GPUs spread all over the world with huge latency and low bandwidth between them. That may change though - there are people working on it:

  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

    Project mention: Introduction to PyTorch | | 2022-05-02


  • alpa

    Auto parallelization for large-scale neural networks

    Project mention: Alpa: Automated Model-Parallel Deep Learning | | 2022-05-03

    GitHub code:

  • HandyRL

    HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

    Project mention: Suggestions for board game reinforcement learning methods, frameworks | | 2022-03-24
  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-05-16.

Python distributed-training related posts


What are some of the best open-source distributed-training projects in Python? This list will help you:

Project Stars
1 pytorch-image-models 18,509
2 ColossalAI 3,508
3 determined 1,701
4 hivemind 1,022
5 adaptdl 294
6 alpa 264
7 HandyRL 225
Find remote jobs at our new job board There are 7 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives