Python distributed-training

Open-source Python projects categorized as distributed-training

Top 9 Python distributed-training Projects

  • pytorch-image-models

    PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

    Project mention: Inference on resent, cant work out the problem? | reddit.com/r/MLQuestions | 2023-05-11

    additionally, you might find the timm library handy for this sort of work.

  • FedML

    FedML - The federated learning and analytics library enabling secure and collaborative machine learning on decentralized data anywhere at any scale. Supporting large-scale cross-silo federated learning, cross-device federated learning on smartphones/IoTs, and research simulation. MLOps and App Marketplace are also enabled (https://open.fedml.ai).

    Project mention: Awesome-Federated-Learning: A curated list of federated learning publications, re-organized from Arxiv (mostly). | reddit.com/r/FederatedLearning | 2023-03-30
  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • skypilot

    SkyPilot is a framework for easily running machine learning workloads on any cloud through a unified interface.

    Project mention: Show HN: Cloud Agnostic AI Platform | news.ycombinator.com | 2023-05-29

    Interesting, happy to chat and provide feedback as I have been working in this field for the last few years. Did you get inspiration by any chance from the following paper : https://arxiv.org/pdf/2205.07147.pdf and their recent implementation https://github.com/skypilot-org/skypilot ?

  • alpa

    Training and serving large-scale neural networks

    Project mention: How to Train Large Models on Many GPUs? | news.ycombinator.com | 2023-02-11

    - Alpa does training and serving with 175B parameter models https://github.com/alpa-projects/alpa

  • hivemind

    Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

    Project mention: Do you think that AI research will slow down to a halt because of regulation? | reddit.com/r/singularity | 2023-05-21

    not if we rise to meet that challenge. here's a few tools that facilitate AI research in the face of an advanced persistent threat: Hivemind- a distributed Pytorch framework

  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

  • HandyRL

    HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • distributed-diffusion

    Train a Stable Diffusion model over the internet with Hivemind

    Project mention: What is Midjourney doing better than us? | reddit.com/r/StableDiffusion | 2023-04-04

    noob here, dunno nothing about how community could contribute be reinforcing a shared training but this is maybe what we should aim. Imagin users contributing in training a large models, with a system of upvotes like midjourney)... They have control over the model and reionforcing that. We are fragmented in multiple models, loras and such. Everyone focusin on different things. Made some researches time ago and ended up here: https://github.com/chavinlo/distributed-diffusion and this https://learning-at-home.github.io/

  • Fast-Kubeflow

    This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

    Project mention: Fast-Kubeflow: Kubeflow Tutorial, Sample Usage Scenarios (Howto: Hands-on LAB) | reddit.com/r/mlops | 2023-01-04
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-05-29.

Python distributed-training related posts

Index

What are some of the best open-source distributed-training projects in Python? This list will help you:

Project Stars
1 pytorch-image-models 25,471
2 FedML 2,815
3 skypilot 2,635
4 alpa 2,489
5 hivemind 1,512
6 adaptdl 363
7 HandyRL 266
8 distributed-diffusion 135
9 Fast-Kubeflow 40
ONLYOFFICE Docs — document collaboration in your environment
Powerful document editing and collaboration in your app or environment. Ultimate security, API and 30+ ready connectors, SaaS or on-premises
www.onlyoffice.com