Top 14 Python Workflow engine Projects

Airflow

170 34,877 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

luigi

14 17,407 6.3 Python

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Project mention: Ask HN: What is the correct way to deal with pipelines? | news.ycombinator.com | 2023-09-21

I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Prefect

19 14,887 10.0 Python

The easiest way to build, run, and monitor data pipelines at scale.

Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13

Kedro

29 9,409 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).

viewflow

5 2,576 7.9 Python

Reusable workflow library for Django

Project mention: Ask HN: Anyone use a code to mindmap/flowchart tool? | news.ycombinator.com | 2024-02-24

https://github.com/django-extensions/django-extensions/blob/...
viewflow supports BPMN: https://github.com/viewflow/viewflow

galaxy

4 1,326 10.0 Python

Data intensive science for everyone.

Project mention: Need for GUIs for bioinformatic tools? | /r/bioinformatics | 2023-06-17

Maybe it would help you to look at the galaxy project: GitHub main site

couler

1 891 5.2 Python

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
NIPY

0 738 9.0 Python

Workflows and interfaces for neuroimaging packages
redun

4 489 6.8 Python

Yet another redundant workflow engine

Project mention: Redun: Yet another redundant workflow engine | news.ycombinator.com | 2023-08-11

jug

4 405 5.3 Python

Parallel programming with Python
flowsaber

2 40 0.0 Python

Dataflow based workflow framework
BPMN_RPA

0 37 7.9 Python

Robotic Process Automation in Windows and Linux by using Diagrams.net BPMN diagrams.
typhoon-orchestrator

14 29 0.0 Python

Create elegant data pipelines and deploy to AWS Lambda or Airflow
pyDag

2 24 0.0 Python

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Workflow engine related posts

Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions

1 project | dev.to | 12 Feb 2024
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey

1 project | dev.to | 31 Dec 2023
Airflow VS quix-streams - a user suggested alternative

2 projects | 7 Dec 2023
Best ETL Tools And Why To Choose

1 project | /r/tactionsoftware | 11 Nov 2023
Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow

2 projects | dev.to | 7 Nov 2023
StackStorm – IFTTT for Ops

7 projects | news.ycombinator.com | 5 Nov 2023
Share Your favorite python related software!

1 project | /r/Python | 1 Oct 2023
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Workflow engine projects in Python? This list will help you:

	Project	Stars
1	Airflow	34,877
2	luigi	17,407
3	Prefect	14,887
4	Kedro	9,409
5	viewflow	2,576
6	galaxy	1,326
7	couler	891
8	NIPY	738
9	redun	489
10	jug	405
11	flowsaber	40
12	BPMN_RPA	37
13	typhoon-orchestrator	29
14	pyDag	24