Python distributed-training

Open-source Python projects categorized as distributed-training

Top 8 Python distributed-training Projects

  • pytorch-image-models

    PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

  • Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • skypilot

    SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

  • Project mention: Ask HN: Most efficient way to fine-tune an LLM in 2024? | news.ycombinator.com | 2024-04-04
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • FedML

    FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.

  • Project mention: [Experiment] The future of AI is open-source, and here is the plan | /r/samkoesnadi | 2023-06-05

    FedML https://github.com/FedML-AI/FedML might already provide a lot of tools to do the job

  • alpa

    Training and serving large-scale neural networks with auto parallelization.

  • hivemind

    Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

  • Project mention: You can now train a 70B language model at home | news.ycombinator.com | 2024-03-07

    https://github.com/learning-at-home/hivemind is also relevant

  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

  • HandyRL

    HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Fast-Kubeflow

    This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python distributed-training related posts

Index

What are some of the best open-source distributed-training projects in Python? This list will help you:

Project Stars
1 pytorch-image-models 29,659
2 skypilot 5,602
3 FedML 4,052
4 alpa 2,983
5 hivemind 1,833
6 adaptdl 395
7 HandyRL 282
8 Fast-Kubeflow 69

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com