Kubernetes Was Never Designed for Batch Jobs

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • armada

    A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.

  • Another aspect of batch jobs is that we’ll often want to run distributed computations where we split our data into chunks and run a function on each chunk. One popular option is to run Spark, which is built for exactly this use case, on top of Kubernetes. And there are other options for additional software to make running distributed computations on Kubernetes easier.

  • kube-batch

    Discontinued A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

  • Another aspect of batch jobs is that we’ll often want to run distributed computations where we split our data into chunks and run a function on each chunk. One popular option is to run Spark, which is built for exactly this use case, on top of Kubernetes. And there are other options for additional software to make running distributed computations on Kubernetes easier.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • kubernetes

    Production-Grade Container Scheduling and Management

  • The simplest option is to just create a single Job object per task. As the documentation points out, this won’t work well with a large number of tasks. One user’s experience is that it’s hard to go beyond a few thousand jobs total. It seems like the best way around that is to use Indexed Jobs, which is a relatively new feature for running multiple copies of the same job where the different copies of the job have different values for the JOB_COMPLETION_INDEX environment variable. This gives us the most fundamental layer for running distributed jobs. As long as each task just needs an index number and doesn’t need to “send back” any outputs, this works. E.g. if all of the tasks are working on a single file and the tasks “know” that they need to process n rows that come after skipping the first JOB_COMPLETION_INDEX * n rows, and then write their output to a database, this works great.

  • RabbitMQ

    Open source RabbitMQ: core server and tier 1 (built-in) plugins

  • One option would be to set up an NFS (Network File System) that’s accessible from outside of the cluster and expose it to our pods in the cluster. The other option is, as usual, a service of some sort. We could use a queue like RabbitMQ that will temporarily store our data and make it available to our pod.

  • Docker Compose

    Define and run multi-container applications with Docker

  • In fact in Docker, “Compose” means combining services:

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts