hera
couler
Our great sponsors
hera | couler | |
---|---|---|
1 | 1 | |
484 | 885 | |
9.3% | 1.6% | |
9.3 | 5.2 | |
3 days ago | 11 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hera
-
New Argo Workflows Python SDK - Hera!
Argo Labs now features Hera Workflows! Hera is a very simple Python library for constructing workflows - it can run any Python script you submit, allowing you and your company to focus on the value of the script itself rather than workflow construction :) you can run anything! ML workflows, DevOps jobs, crons with API calls, ETLs using pure Python, containers with commands, easily integrate with things like Horovod, etc.
couler
-
(Not) to Write a Pipeline
author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
What are some alternatives?
flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
soopervisor - ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
awesome-argo - A curated list of awesome projects and resources related to Argo (a CNCF graduated project)
community - Information about the Kubeflow community including proposals and governance information.
django-hurricane - Hurricane is an initiative to fit Django perfectly with Kubernetes.
argo - Workflow Engine for Kubernetes
doit - task management & automation tool
covalent - Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
elyra - Elyra extends JupyterLab with an AI centric approach.
covalent - Teradata UI Platform built on Angular Material
argo-workflows-aws-plugin - Argo Workflows Executor Plugin for AWS Services, e.g. SageMaker Pipelines, Glue, etc.