[Discussion] Open source scheduler and queuing system for model training/inferencing tasks?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • adaptdl

    Resource-adaptive cluster scheduler for deep learning training.

  • check out https://github.com/petuum/adaptdl. It natively supports AWS/EKS (for the autoscaling feature), otherwise it can run anywhere on Kubernetes.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Reduce cost by 3x in the cloud and improve GPU usage in shared clusters with AdaptDL for PyTorch

    1 project | /r/u_Henry-GO | 24 Mar 2021
  • How we were able to achieve hyper-parameter tuning (HPT) for deep learning workflows at 1.5x faster in our clusters and 3x cheaper on AWS

    1 project | /r/learnmachinelearning | 23 Feb 2021
  • [D] Anyone deploy DL models with AWS Lambda? Trying to estimate costs

    2 projects | /r/MachineLearning | 5 Apr 2021
  • SB-1047 will stifle open-source AI and decrease safety

    2 projects | news.ycombinator.com | 29 Apr 2024
  • Getting Started with Gemma Models

    4 projects | dev.to | 15 Apr 2024