Distributed training with Horovod/MPI

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

sagemaker-training-toolkit

1 468 6.3 Python

Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.

I'm using sagemaker-training-toolkit to attempt hyperparameter optimization and trying to take advantage of all the cores on each machine using their MPI options (which uses Horovod with MPI to my understanding). I'm pretty new to this space and can't find anything that describes in somewhat lay-terms how training works in this distributed model. With AllReduce, how often does the reduce happen? I'm trying to figure out if all training threads are training a shared model such that every thread is training on the "latest" version of the model.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Use function calling to build AI Assistants

1 project | news.ycombinator.com | 27 Feb 2024
Phidata: Build AI Assistants using function calling

1 project | news.ycombinator.com | 25 Feb 2024
Chat with ArXiv Papers

2 projects | news.ycombinator.com | 5 Feb 2024
Chat with PDFs using function calling

2 projects | news.ycombinator.com | 2 Feb 2024
Developing and Deploying a Complete Project Using FastAPI, Docker, and AWS

1 project | news.ycombinator.com | 5 May 2021

Distributed training with Horovod/MPI

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions
AWS sagemaker Training Docker Python
Post date: 2 Apr 2021

sagemaker-training-toolkit

InfluxDB

Related posts

Show HN: Use function calling to build AI Assistants

Phidata: Build AI Assistants using function calling

Chat with ArXiv Papers

Chat with PDFs using function calling

Developing and Deploying a Complete Project Using FastAPI, Docker, and AWS

Distributed training with Horovod/MPI

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions AWS sagemaker Training Docker Python Post date: 2 Apr 2021

sagemaker-training-toolkit

InfluxDB

Related posts

Show HN: Use function calling to build AI Assistants

Phidata: Build AI Assistants using function calling

Chat with ArXiv Papers

Chat with PDFs using function calling

Developing and Deploying a Complete Project Using FastAPI, Docker, and AWS

This page summarizes the projects mentioned and recommended in the original post on /r/MLQuestions
AWS sagemaker Training Docker Python
Post date: 2 Apr 2021