-
tldr; Knative Batch Job provider should support the respective coscheduling and kube-batch support. We had developed an in-house one for KubeFlow, from scratch. We had added Apache Arrow support into knative-serving with the respective CloudEvents interop layer, natively (i.e. secure shmem via IPC namespace, instead of message passing on the same host). We use it as a direct replacement for Apache Arrow Ballista, and had planned researching further DataFusion compat layer. Almost any modern ETL is pretty dubious without Apache Arrow.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
kube-batch
Discontinued A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC
tldr; Knative Batch Job provider should support the respective coscheduling and kube-batch support. We had developed an in-house one for KubeFlow, from scratch. We had added Apache Arrow support into knative-serving with the respective CloudEvents interop layer, natively (i.e. secure shmem via IPC namespace, instead of message passing on the same host). We use it as a direct replacement for Apache Arrow Ballista, and had planned researching further DataFusion compat layer. Almost any modern ETL is pretty dubious without Apache Arrow.
-
PaddlePaddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Volcano is a batch scheduler on top of Kube-batch targetting spark-operator, plain old MPI, chinesium paddlepaddle, and Kromwell HPC.
-
cromwell
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Volcano is a batch scheduler on top of Kube-batch targetting spark-operator, plain old MPI, chinesium paddlepaddle, and Kromwell HPC.
-
tldr; you should start with KubeFlow 99% of the time. The respective job scheduling workflows (including volano) can be managed with Kubeflow Arena. Vulcano is ok, but I personally prefer Nvidia's Merlin + Triton inference on top of ONNX and MS ONNX Runtime. I do like to train with GPU's on Merlin in GKE (TabularNV and HugeCTR's tbe), and run TFKeras ReLu models on CPU's with OpenVino on AWS EKS, to optimize costs a bit. I do use Kubeflow on top of TektonCD for OpenShift, while some folks do prefer Argo Workflows and Apache Airflow, in the end - it's all DAG pipelines, so doesn't really matter.
Related posts
-
Primer on Distributed Parallel Processing with Ray using KubeRay
-
Baidu AI Researchers Introduce SE-MoE That Proposes Elastic MoE Training With 2D Prefetch And Fusion Communication Over Hierarchical Storage
-
Ask HN: Who is hiring? (March 2022)
-
I have issue with only __habs for half datatype? Please help!
-
Machine Learning Orchestration on Kubernetes using Kubeflow