Scaling Kubernetes to 7,500 Nodes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • kube-batch

    Discontinued A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

    > That said, strain on the kube-scheduler is spiky. A new job may consist of many hundreds of pods all being created at once, then return to a relatively low rate of churn.

    Last I checked, the default scheduler places Pods one at a time. It might be advantageous to use a gang/batch scheduler like kube-batch[0], Poseidon[1] or DCM[2].

    [0] https://github.com/kubernetes-sigs/kube-batch

    [1] https://github.com/kubernetes-sigs/poseidon

    [2] https://github.com/vmware/declarative-cluster-management

  • sarus

    OCI-compatible engine to deploy Linux containers on HPC environments.

    The problem with slurm is how it's typically used: ssh into a shared login node with a shared file system, auth is handled by the linux users mostly, submit jobs with sbatch. Kubernetes deployment feels much more modern and safe.

    I have worked with containers + slurm, where the vendor libmpi is injected in the container runtime [1] by a hook, which gives you close to bare metal performance with some container goodness in terms of isolation and deployment.

    [1] https://github.com/eth-cscs/sarus

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • mpi-operator

    Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)

    Hi, kube-scheduler maintainer here, currently looking into enabling MPI use cases in k8s.

    We started a discussion in https://github.com/kubeflow/mpi-operator/issues/315

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts