couler
hera
couler | hera | |
---|---|---|
1 | 1 | |
890 | 489 | |
1.2% | 4.9% | |
5.2 | 9.3 | |
20 days ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
couler
-
(Not) to Write a Pipeline
author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
hera
-
New Argo Workflows Python SDK - Hera!
Argo Labs now features Hera Workflows! Hera is a very simple Python library for constructing workflows - it can run any Python script you submit, allowing you and your company to focus on the value of the script itself rather than workflow construction :) you can run anything! ML workflows, DevOps jobs, crons with API calls, ETLs using pure Python, containers with commands, easily integrate with things like Horovod, etc.
What are some alternatives?
soopervisor - ☁️ Export Ploomber pipelines to Kubernetes (Argo), Airflow, AWS Batch, SLURM, and Kubeflow.
flyte - Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
community - Information about the Kubeflow community including proposals and governance information.
awesome-argo - A curated list of awesome projects and resources related to Argo (a CNCF graduated project)
argo - Workflow Engine for Kubernetes
django-hurricane - Hurricane is an initiative to fit Django perfectly with Kubernetes.
doit - task management & automation tool
elyra - Elyra extends JupyterLab with an AI centric approach.
covalent - Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
argo-workflows-aws-plugin - Argo Workflows Executor Plugin for AWS Services, e.g. SageMaker Pipelines, Glue, etc.
covalent - Teradata UI Platform built on Angular Material