Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 14 distributed-training Open-Source Projects
-
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
PaddlePaddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
-
skypilot
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
-
FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, FEDML Nexus AI (https://fedml.ai) is your generative AI platform at scale.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
determined
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
-
hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
-
relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
-
HandyRL
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
-
Fast-Kubeflow
This repo covers Kubeflow Environment with LABs: Kubeflow GUI, Jupyter Notebooks on pods, Kubeflow Pipelines, Experiments, KALE, KATIB (AutoML: Hyperparameter Tuning), KFServe (Model Serving), Training Operators (Distributed Training), Projects, etc.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13Made With ML
Click to Learn more...
Project mention: Ask HN: Most efficient way to fine-tune an LLM in 2024? | news.ycombinator.com | 2024-04-04
Project mention: [Experiment] The future of AI is open-source, and here is the plan | /r/samkoesnadi | 2023-06-05FedML https://github.com/FedML-AI/FedML might already provide a lot of tools to do the job
17. Determined AI | Github | tutorial
https://github.com/learning-at-home/hivemind is also relevant
Project mention: Efficient Deep Learning Systems Course (Yandex/HSE) | news.ycombinator.com | 2024-01-19
Project mention: ReLoRA: High-Rank Training Through Low-Rank Updates | news.ycombinator.com | 2023-12-21
distributed-training related posts
-
ReLoRA: High-Rank Training Through Low-Rank Updates
-
Would anyone be interested in contributing to some group projects?
-
Hive mind:Train deep learning models on thousands of volunteers across the world
-
Could a model not be trained by a decentralized network? Like Seti @ home or kinda-sorta like bitcoin. Petals accomplishes this somewhat, but if raw computer power is the only barrier to open-source I'd be happy to try organizing decentalized computing efforts
-
Orca (built on llama13b) looks like the new sheriff in town
-
[Experiment] The future of AI is open-source, and here is the plan
-
Do you think that AI research will slow down to a halt because of regulation?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 5 May 2024
Index
What are some of the best open-source distributed-training projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Made-With-ML | 35,702 |
2 | pytorch-image-models | 29,828 |
3 | PaddlePaddle | 21,625 |
4 | skypilot | 5,675 |
5 | FedML | 4,062 |
6 | adanet | 3,470 |
7 | alpa | 2,986 |
8 | determined | 2,868 |
9 | hivemind | 1,840 |
10 | efficient-dl-systems | 580 |
11 | relora | 399 |
12 | adaptdl | 395 |
13 | HandyRL | 282 |
14 | Fast-Kubeflow | 70 |
Sponsored