Airflow + Slurm for ML Training Pipelines?

This page summarizes the projects mentioned and recommended in the original post on /r/mlops

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • aws-sfn-resume-from-any-state

    Resume failed state machines midstream and skip all previously succeded steps.

  • A state machine is not easy to resume from where it failed. So you either need to rerun it entirely or use this to recreate the previous run dynamically which is a bit too hacky to my taste or you simply string the remaining batch jobs together manually

  • paradigm

    Hassle-free ML Pipelines on Kubernetes

  • Prefect is a good choice, But I wanted a much simpler tool. Hence, I built a barebone workflow controller here.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts