After Airflow. Where next for DE?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Great point! Totally agree. A tool that can build complex high-code data pipelines to load data from multiple sources, do a bunch of transformations in parallel, then export that data to another table or multiple locations. BUT, also have that tool be able to build simple data integration pipelines; e.g. fetch data from Salesforce and replicate it in Snowflake. This is what Mage can do: batch pipelines and data integration pipelines.

  • typhoon-orchestrator

    Create elegant data pipelines and deploy to AWS Lambda or Airflow

  • We're still early stages, but since you've worked with lambda it would be really valuable to get your thoughts if you get a chance to check out the readme https://github.com/typhoon-data-org/typhoon-orchestrator.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • astro

    Discontinued Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow. [Moved to: https://github.com/astronomer/astro-sdk] (by astro-projects)

  • What I would suggest is if you want an "Airflow 3.0" feel you check out the Astro SDK. My team and I basically spent a year and a half rewriting the Airflow DAG writing experience from the ground up. Completely different feel, highly scalable SQL/python/spark (soon) workflows that basically feel like native python. Way easier to test as well. You can pass dataframes into SQL queries, load data from any supported source to any supported warehouse, and things like lineage are natively supported :)

  • getting-started

    This repository is a getting started guide to Singer. (by singer-io)

  • Mage uses the Singer Spec (https://github.com/singer-io/getting-started/blob/master/docs/SPEC.md), the data engineer community standard for building data integrations. This was created by Stitch and is widely adopted.

  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

  • More of a general principle but when you don't have design patterns, you get varying levels of results right? I think what Astro is doing to introduce "strong defaults" through projects like the astro-sdk or the cloud ide are interesting experiments to remove some of the busy work of common dags (load from s3, do something, push to database) will HELP reduce the cognitive load of really common, simple actions and give them a better single pattern to optimize on. I don't think those efforts reduce the optionality of true power users at all who want to custom code their s3 log sink to have some unique implementation while at the same time maybe solving some of the fragmentation to very frequently performed operations. 🤞

  • proposals

    Temporal proposals (by temporalio)

  • Rewrite Airflow on top of temporal.io. This way, you get unlimited scalability and very high reliability out of the box and would be able to innovate on the features that matter for DE.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts