After Airflow. Where next for DE?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Mage

77 7,001 9.9 Python

🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

Great point! Totally agree. A tool that can build complex high-code data pipelines to load data from multiple sources, do a bunch of transformations in parallel, then export that data to another table or multiple locations. BUT, also have that tool be able to build simple data integration pipelines; e.g. fetch data from Salesforce and replicate it in Snowflake. This is what Mage can do: batch pipelines and data integration pipelines.

typhoon-orchestrator

14 29 0.0 Python

Create elegant data pipelines and deploy to AWS Lambda or Airflow

We're still early stages, but since you've worked with lambda it would be really valuable to get your thoughts if you get a chance to check out the readme https://github.com/typhoon-data-org/typhoon-orchestrator.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
astro

2 183 10.0 Python

Discontinued Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow. [Moved to: https://github.com/astronomer/astro-sdk] (by astro-projects)

What I would suggest is if you want an "Airflow 3.0" feel you check out the Astro SDK. My team and I basically spent a year and a half rewriting the Airflow DAG writing experience from the ground up. Completely different feel, highly scalable SQL/python/spark (soon) workflows that basically feel like native python. Way easier to test as well. You can pass dataframes into SQL queries, load data from any supported source to any supported warehouse, and things like lineage are natively supported :)

getting-started

16 1,220 0.0 Makefile

This repository is a getting started guide to Singer. (by singer-io)

Mage uses the Singer Spec (https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md), the data engineer community standard for building data integrations. This was created by Stitch and is widely adopted.

astro-sdk

7 317 8.5 Python

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

More of a general principle but when you don't have design patterns, you get varying levels of results right? I think what Astro is doing to introduce "strong defaults" through projects like the astro-sdk or the cloud ide are interesting experiments to remove some of the busy work of common dags (load from s3, do something, push to database) will HELP reduce the cognitive load of really common, simple actions and give them a better single pattern to optimize on. I don't think those efforts reduce the optionality of true power users at all who want to custom code their s3 log sink to have some unique implementation while at the same time maybe solving some of the fragmentation to very frequently performed operations. 🤞

proposals

60 63 4.0

Temporal proposals (by temporalio)

Rewrite Airflow on top of temporal.io. This way, you get unlimited scalability and very high reliability out of the box and would be able to innovate on the features that matter for DE.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The Design Philosophy of Great Tables (Software Package)
7 projects | news.ycombinator.com | 4 Apr 2024
Welcome to 14 days of Data Science!
1 project | dev.to | 7 Mar 2024
[D] Major bug in Scikit-Learn's implementation of F-1 score
2 projects | /r/MachineLearning | 8 Dec 2023
Read files from s3 using Pandas/s3fs or AWS Data Wrangler?
3 projects | /r/dataengineering | 6 Dec 2023
Why do companies still build data ingestion tooling instead of using a third-party tool like Airbyte?
1 project | /r/dataengineering | 6 Dec 2023

After Airflow. Where next for DE?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering
Python ETL Airflow Data Analysis Data Science
Post date: 15 Nov 2022

Mage

typhoon-orchestrator

WorkOS

astro

getting-started

astro-sdk

proposals

Related posts

After Airflow. Where next for DE?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering Python ETL Airflow Data Analysis Data Science Post date: 15 Nov 2022

Mage

typhoon-orchestrator

WorkOS

astro

getting-started

astro-sdk

proposals

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering
Python ETL Airflow Data Analysis Data Science
Post date: 15 Nov 2022