|5 days ago||6 months ago|
|Apache License 2.0||Apache License 2.0|
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
My Journey With Spark On Kubernetes... In Python (1/3)
4 projects | dev.to | 12 Apr 2021
For our experiments, we will use Volcano which is a batch scheduler for Kubernetes, well-suited for scheduling Spark applications pods with a better efficiency than the default kube-scheduler. The main reason is that Volcano allows "group scheduling" or "gang scheduling": while the default scheduler of Kubernetes schedules containers one by one, Volcano ensures that a gang of related containers (here, the Spark driver and its executors) can be scheduled at the same time. If for any reason it is not possible to deploy all the containers in a gang, Volcano will not schedule that gang. This article explains in more detail the reasons for using Volcano.
Scaling Kubernetes to 7,500 Nodes
3 projects | news.ycombinator.com | 25 Jan 2021
> That said, strain on the kube-scheduler is spiky. A new job may consist of many hundreds of pods all being created at once, then return to a relatively low rate of churn.
Last I checked, the default scheduler places Pods one at a time. It might be advantageous to use a gang/batch scheduler like kube-batch, Poseidon or DCM.
What are some alternatives?
spark-on-k8s-operator - Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
argo - Workflow engine for Kubernetes
mpi-operator - Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
warewulf - Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems.
singularity-cri - The Singularity implementation of the Kubernetes Container Runtime Interface
sidekick - High Performance HTTP Sidecar Load Balancer
sarus - OCI-compatible engine to deploy Linux containers on HPC environments.
kube-scheduler-simulator - A web-based simulator for the Kubernetes scheduler
descheduler - Descheduler for Kubernetes [Moved to: https://github.com/kubernetes-sigs/descheduler]
charts - ⚠️(OBSOLETE) Curated applications for Kubernetes
kubernetes-operator-roiergasias - 'Roiergasias' kubernetes operator is meant to address a fundamental requirement of any data science / machine learning project running their pipelines on Kubernetes - which is to quickly provision a declarative data pipeline (on demand) for their various project needs using simple kubectl commands. Basically, implementing the concept of No Ops. The fundamental principle is to utilise best of docker, kubernetes and programming language features to run a workflow with minimal workflow definition syntax. It is a Go based workflow running on command line or Kubernetes with the help of a custom operator for a quick and automated data pipeline for your machine learning projects (a flavor of MLOps).