arena
mpi-operator
arena | mpi-operator | |
---|---|---|
1 | 1 | |
709 | 401 | |
1.8% | 2.2% | |
8.3 | 7.3 | |
5 days ago | 13 days ago | |
Go | Go | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
arena
-
Volcano vs Yunikorn vs Knative
tldr; you should start with KubeFlow 99% of the time. The respective job scheduling workflows (including volano) can be managed with Kubeflow Arena. Vulcano is ok, but I personally prefer Nvidia's Merlin + Triton inference on top of ONNX and MS ONNX Runtime. I do like to train with GPU's on Merlin in GKE (TabularNV and HugeCTR's tbe), and run TFKeras ReLu models on CPU's with OpenVino on AWS EKS, to optimize costs a bit. I do use Kubeflow on top of TektonCD for OpenShift, while some folks do prefer Argo Workflows and Apache Airflow, in the end - it's all DAG pipelines, so doesn't really matter.
mpi-operator
-
Scaling Kubernetes to 7,500 Nodes
Hi, kube-scheduler maintainer here, currently looking into enabling MPI use cases in k8s.
We started a discussion in https://github.com/kubeflow/mpi-operator/issues/315
What are some alternatives?
determined - Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
kube-batch - A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
ML-Workspace - 🛠All-in-one web-based IDE specialized for machine learning and data science.
polyaxon - MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
onepanel - The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
cromwell - Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
kserve - Standardized Serverless ML Inference Platform on Kubernetes
sarus - OCI-compatible engine to deploy Linux containers on HPC environments.
kubeflow - Machine Learning Toolkit for Kubernetes
optimism-v2 - ARCHIVE of monorepo implementing Boba, an L2 Compute solution built on Optimistic Ethereum - active repo is at https://github.com/bobanetwork/boba