Kubernetes Was Never Designed for Batch Jobs

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

armada

8 413 9.7 Go

A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.

Another aspect of batch jobs is that we’ll often want to run distributed computations where we split our data into chunks and run a function on each chunk. One popular option is to run Spark, which is built for exactly this use case, on top of Kubernetes. And there are other options for additional software to make running distributed computations on Kubernetes easier.

kube-batch

3 1,057 4.0 Go

Discontinued A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

Another aspect of batch jobs is that we’ll often want to run distributed computations where we split our data into chunks and run a function on each chunk. One popular option is to run Spark, which is built for exactly this use case, on top of Kubernetes. And there are other options for additional software to make running distributed computations on Kubernetes easier.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
kubernetes

656 106,611 10.0 Go

Production-Grade Container Scheduling and Management

The simplest option is to just create a single Job object per task. As the documentation points out, this won’t work well with a large number of tasks. One user’s experience is that it’s hard to go beyond a few thousand jobs total. It seems like the best way around that is to use Indexed Jobs, which is a relatively new feature for running multiple copies of the same job where the different copies of the job have different values for the JOB_COMPLETION_INDEX environment variable. This gives us the most fundamental layer for running distributed jobs. As long as each task just needs an index number and doesn’t need to “send back” any outputs, this works. E.g. if all of the tasks are working on a single file and the tasks “know” that they need to process n rows that come after skipping the first JOB_COMPLETION_INDEX * n rows, and then write their output to a database, this works great.

RabbitMQ

92 11,590 10.0 Starlark

Open source RabbitMQ: core server and tier 1 (built-in) plugins

One option would be to set up an NFS (Network File System) that’s accessible from outside of the cluster and expose it to our pods in the cluster. The other option is, as usual, a service of some sort. We could use a queue like RabbitMQ that will temporarily store our data and make it available to our pod.

Docker Compose

384 32,312 9.6 Go

Define and run multi-container applications with Docker

In fact in Docker, “Compose” means combining services:

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Building Llama as a Service (LaaS)
14 projects | dev.to | 8 Apr 2024
Kubernetes and back – Why I don't run distributed systems
1 project | news.ycombinator.com | 28 Mar 2024
Get a specific apiVersion manifest from k8s
1 project | dev.to | 19 Mar 2024
6 Kubernetes Ports: A Definitive Look - Expose, NodePort, TargetPort, & More
1 project | dev.to | 21 Jan 2024
Exploring OpenShift with CRC
2 projects | dev.to | 13 Jan 2024

Kubernetes Was Never Designed for Batch Jobs

This page summarizes the projects mentioned and recommended in the original post on dev.to
DevOps Tools Kubernetes Go Software Packages Queuing
Post date: 1 Sep 2022

armada

kube-batch

InfluxDB

kubernetes

RabbitMQ

Docker Compose

WorkOS

Related posts

Kubernetes Was Never Designed for Batch Jobs

This page summarizes the projects mentioned and recommended in the original post on dev.to DevOps Tools Kubernetes Go Software Packages Queuing Post date: 1 Sep 2022

armada

kube-batch

InfluxDB

kubernetes

RabbitMQ

Docker Compose

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on dev.to
DevOps Tools Kubernetes Go Software Packages Queuing
Post date: 1 Sep 2022