distributed-training

Open-source projects categorized as distributed-training

Top 14 distributed-training Open-Source Projects

  • Made-With-ML

    Learn how to design, develop, deploy and iterate on production-grade ML applications.

  • Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

    Made With ML

  • pytorch-image-models

    PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

  • Project mention: FLaNK AI Weekly 18 March 2024 | dev.to | 2024-03-18
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • PaddlePaddle

    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

  • Project mention: List of AI-Models | /r/GPT_do_dah | 2023-05-16

    Click to Learn more...

  • skypilot

    SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

  • Project mention: Ask HN: Most efficient way to fine-tune an LLM in 2024? | news.ycombinator.com | 2024-04-04
  • FedML

    FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.

  • Project mention: [Experiment] The future of AI is open-source, and here is the plan | /r/samkoesnadi | 2023-06-05

    FedML https://github.com/FedML-AI/FedML might already provide a lot of tools to do the job

  • adanet

    Fast and flexible AutoML with learning guarantees.

  • alpa

    Training and serving large-scale neural networks with auto parallelization.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • determined

    Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

  • Project mention: Open Source Advent Fun Wraps Up! | dev.to | 2024-01-05

    17. Determined AI | Github | tutorial

  • hivemind

    Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.

  • Project mention: You can now train a 70B language model at home | news.ycombinator.com | 2024-03-07

    https://github.com/learning-at-home/hivemind is also relevant

  • efficient-dl-systems

    Efficient Deep Learning Systems course materials (HSE, YSDA)

  • Project mention: Efficient Deep Learning Systems Course (Yandex/HSE) | news.ycombinator.com | 2024-01-19
  • relora

    Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

  • Project mention: ReLoRA: High-Rank Training Through Low-Rank Updates | news.ycombinator.com | 2023-12-21
  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

  • HandyRL

    HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

  • Fast-Kubeflow

    This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

distributed-training related posts

  • ReLoRA: High-Rank Training Through Low-Rank Updates

    1 project | news.ycombinator.com | 21 Dec 2023
  • Would anyone be interested in contributing to some group projects?

    4 projects | /r/learnmachinelearning | 24 Aug 2023
  • Hive mind:Train deep learning models on thousands of volunteers across the world

    1 project | news.ycombinator.com | 20 Jun 2023
  • Could a model not be trained by a decentralized network? Like Seti @ home or kinda-sorta like bitcoin. Petals accomplishes this somewhat, but if raw computer power is the only barrier to open-source I'd be happy to try organizing decentalized computing efforts

    2 projects | /r/LocalLLaMA | 17 Jun 2023
  • Orca (built on llama13b) looks like the new sheriff in town

    2 projects | /r/LocalLLaMA | 6 Jun 2023
  • [Experiment] The future of AI is open-source, and here is the plan

    1 project | /r/samkoesnadi | 5 Jun 2023
  • Do you think that AI research will slow down to a halt because of regulation?

    1 project | /r/singularity | 21 May 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 5 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source distributed-training projects? This list will help you:

Project Stars
1 Made-With-ML 35,702
2 pytorch-image-models 29,828
3 PaddlePaddle 21,625
4 skypilot 5,675
5 FedML 4,062
6 adanet 3,470
7 alpa 2,986
8 determined 2,868
9 hivemind 1,840
10 efficient-dl-systems 580
11 relora 399
12 adaptdl 395
13 HandyRL 282
14 Fast-Kubeflow 70

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com