Top 23 Workflow engine Open-Source Projects

Airflow

169 34,397 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
luigi

14 17,292 6.4 Python

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Project mention: Ask HN: What is the correct way to deal with pipelines? | news.ycombinator.com | 2023-09-21

I agree there are many options in this space. Two others to consider:
- https://airflow.apache.org/
- https://github.com/spotify/luigi
There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Prefect

19 14,512 9.9 Python

The easiest way to build, run, and monitor data pipelines at scale.

Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
argo

43 14,259 9.8 Go

Workflow Engine for Kubernetes

Project mention: StackStorm – IFTTT for Ops | news.ycombinator.com | 2023-11-05

Like Argo Workflows?
https://github.com/argoproj/argo-workflows
conductor

5 10,000 9.3 Java

Conductor is an event driven orchestration platform (by conductor-oss)

Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08
temporal

16 9,739 9.8 Go

Temporal service

Project mention: Rethinking Serverless with Flame | news.ycombinator.com | 2023-12-06

I don't know if I agree with the argument regarding durability vs elastic execution. If I can get both (with a nice API/DX) via something like Temporal (https://github.com/temporalio/temporal), what's the drawback here?
Kedro

29 9,341 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Flowable (V6)

2 7,354 9.2 Java

A compact and highly efficient workflow and Business Process Management (BPM) platform for developers, system admins and business users.
kestra

32 6,260 9.9 Java

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

Project mention: A High-Performance, Java-Based Orchestration Platform | /r/java | 2023-10-11

Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra
Workflow Core

4 5,049 3.4 C#

Lightweight workflow engine for .NET Standard
devtron

97 3,835 9.7 Go

Tool integration platform for Kubernetes

Project mention: Devtron - End-to-End Software Delivery for Kubernetes Applications | /r/kubernetes | 2023-10-05
zeebe

6 3,013 10.0 Java

Distributed Workflow Engine for Microservices Orchestration

Project mention: Is there a product that can orchestrate running jobs? | /r/kubernetes | 2023-07-12
hatchet

16 2,683 9.6 Python

A distributed, fault-tolerant task queue

Project mention: Ask HN: Who is hiring? (April 2024) | news.ycombinator.com | 2024-04-01

Hatchet (https://hatchet.run) | New York City | Full-time
We're hiring a founding engineer to help us with development on our open-source, distributed task queue: https://github.com/hatchet-dev/hatchet.
We recently launched on HN, you can check out our launch here: https://news.ycombinator.com/item?id=39643136. We're two second-time YC founders in this for the long haul and we are just wrapping up the YC W24 batch.
As a founding engineer, you'll be responsible for contributing across the entire codebase. We'll compensate accordingly and with high equity. It's currently just the two founders + a part-time contractor. We're all technical and contribute code.
Stack: Typescript/React, Go and PostgreSQL.
To apply, email alexander [at] hatchet [dot] run, and include the following:
1. Tell us about something impressive you've built.
2. Ask a question or write a comment about the state of the project. For example: a file that stood out to you in the codebase, a Github issue or discussion that piqued your interest, a general comment on distributed systems/task queues, or why our code is bad and how you could improve it.
nextflow

9 2,538 9.7 Groovy

A DSL for data-driven computational pipelines

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

> It's been a while since you can rerun/resume Nextflow pipelines
Yes, you can resume, but you need your whole upstream DAG to be present. Snakemake can rerun a job when only the dependencies of that job are present, which allows to neatly manage the disk usage, or archive an intermediate state of a project and rerun things from there.
> and yes, you can have dry runs in Nextflow
You have stubs, which really isn't the same thing.
> I have no idea what you're referring to with the 'arbitrary limit of 1000 parallel jobs' though
I was referring to this issue: https://github.com/nextflow-io/nextflow/issues/1871. Except, the discussion doesn't give the issue a full justice. Nextflow spans each job in a separate thread, and when it tries to span 1000+ condor jobs it die with a cryptic error message. The option of -Dnxf.pool.type=sync and -Dnxf.pool.maxThreads=N prevents the ability to resume and attempts to rerun the pipeline.
> As for deleting temporary files, there are features that allow you to do a few things related to that, and other features being implemented.
There are some hacks for this - but nothing I would feel safe to integrate into a production tool. They are implementing something - you're right - and it's been the case for several years now, so we'll see.
Snakemake has all that out of the box.
viewflow

5 2,537 7.8 Python

Reusable workflow library for Django

Project mention: Ask HN: Anyone use a code to mindmap/flowchart tool? | news.ycombinator.com | 2024-02-24

https://github.com/django-extensions/django-extensions/blob/...
viewflow supports BPMN: https://github.com/viewflow/viewflow
awesome-argo

6 1,780 7.3

A curated list of awesome projects and resources related to Argo (a CNCF graduated project)
galaxy

4 1,310 10.0 Python

Data intensive science for everyone.

Project mention: Need for GUIs for bioinformatic tools? | /r/bioinformatics | 2023-06-17

Maybe it would help you to look at the galaxy project: GitHub main site
uTask

0 1,095 8.2 Go

µTask is an automation engine that models and executes business processes declared in yaml. ✏️📋
scipipe

1 1,052 3.0 Go

Robust, flexible and resource-efficient pipelines using Go and the commandline
goflow

1 1,025 6.5 CSS

A Golang based high performance, scalable and distributed workflow framework (by s8sg)
Gush

0 1,023 6.5 Ruby

Fast and distributed workflow runner using ActiveJob and Redis
titanoboa

5 905 0.0 Clojure

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
couler

1 883 5.2 Python

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-01.

Workflow engine related posts

Show HN: Workflow Orchestrator in Golang
7 projects | news.ycombinator.com | 4 Mar 2024
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
1 project | dev.to | 12 Feb 2024
When "letting it crash" is not enough
4 projects | news.ycombinator.com | 7 Feb 2024
How To Collect Temporal.io Logs Using Axiom And Pino
4 projects | dev.to | 17 Jan 2024
Ask HN: How have you implemented human-in-the-loop workflows?
1 project | news.ycombinator.com | 11 Jan 2024
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
1 project | dev.to | 31 Dec 2023
Which queue System you prefer for ecommerce and PS
1 project | /r/node | 7 Dec 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Workflow engine projects? This list will help you:

	Project	Stars
1	Airflow	34,397
2	luigi	17,292
3	Prefect	14,512
4	argo	14,259
5	conductor	10,000
6	temporal	9,739
7	Kedro	9,341
8	Flowable (V6)	7,354
9	kestra	6,260
10	Workflow Core	5,049
11	devtron	3,835
12	zeebe	3,013
13	hatchet	2,683
14	nextflow	2,538
15	viewflow	2,537
16	awesome-argo	1,780
17	galaxy	1,310
18	uTask	1,095
19	scipipe	1,052
20	goflow	1,025
21	Gush	1,023
22	titanoboa	905
23	couler	883