Our great sponsors
-
kube-batch
Discontinued A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
> That said, strain on the kube-scheduler is spiky. A new job may consist of many hundreds of pods all being created at once, then return to a relatively low rate of churn.
Last I checked, the default scheduler places Pods one at a time. It might be advantageous to use a gang/batch scheduler like kube-batch[0], Poseidon[1] or DCM[2].
[0] https://github.com/kubernetes-sigs/kube-batch
[1] https://github.com/kubernetes-sigs/poseidon
[2] https://github.com/vmware/declarative-cluster-management
-
The problem with slurm is how it's typically used: ssh into a shared login node with a shared file system, auth is handled by the linux users mostly, submit jobs with sbatch. Kubernetes deployment feels much more modern and safe.
I have worked with containers + slurm, where the vendor libmpi is injected in the container runtime [1] by a hook, which gives you close to bare metal performance with some container goodness in terms of isolation and deployment.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Hi, kube-scheduler maintainer here, currently looking into enabling MPI use cases in k8s.
We started a discussion in https://github.com/kubeflow/mpi-operator/issues/315