Top 9 Python distributed-training Projects

  • pytorch-image-models

    PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

  • FedML

    FedML - The federated learning and analytics library enabling secure and collaborative machine learning on decentralized data anywhere at any scale. Supporting large-scale cross-silo federated learning, cross-device federated learning on smartphones/IoTs, and research simulation. MLOps and App Marketplace are also enabled (

  • Sonar

  • skypilot

    SkyPilot is a framework for easily running machine learning workloads on any cloud through a unified interface.

  • alpa

    Training and serving large-scale neural networks

  • hivemind

    Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

  • HandyRL

    HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

  • InfluxDB

  • distributed-diffusion

    Train a Stable Diffusion model over the internet with Hivemind

  • Fast-Kubeflow

    This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

