Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 14 Python Workflow engine Projects
-
luigi
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
couler
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
Project mention: Ask HN: What is the correct way to deal with pipelines? | news.ycombinator.com | 2023-09-21I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
Project mention: Ask HN: Anyone use a code to mindmap/flowchart tool? | news.ycombinator.com | 2024-02-24https://github.com/django-extensions/django-extensions/blob/...
viewflow supports BPMN: https://github.com/viewflow/viewflow
Maybe it would help you to look at the galaxy project: GitHub main site
author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
Python Workflow engine related posts
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
-
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023 -
Best ETL Tools And Why To Choose
-
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow
-
StackStorm – IFTTT for Ops
-
Share Your favorite python related software!
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024
Index
What are some of the best open-source Workflow engine projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Airflow | 34,877 |
2 | luigi | 17,407 |
3 | Prefect | 14,887 |
4 | Kedro | 9,409 |
5 | viewflow | 2,576 |
6 | galaxy | 1,326 |
7 | couler | 891 |
8 | NIPY | 738 |
9 | redun | 489 |
10 | jug | 405 |
11 | flowsaber | 40 |
12 | BPMN_RPA | 37 |
13 | typhoon-orchestrator | 29 |
14 | pyDag | 24 |
Sponsored