Airflow VS dagster

Compare Airflow vs dagster and see what are their differences.

Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows (by apache)

dagster

An orchestration platform for the development, production, and observation of data assets. (by dagster-io)
Our great sponsors
  • Nanos - Run Linux Software Faster and Safer than Linux with Unikernels
  • Scout APM - A developer's best friend. Try free for 14-days
  • SaaSHub - Software Alternatives and Reviews
Airflow dagster
59 16
23,851 4,026
2.6% 4.7%
10.0 9.9
5 days ago 3 days ago
Python Python
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Airflow

Posts with mentions or reviews of Airflow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-11-09.

dagster

Posts with mentions or reviews of dagster. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-11.
  • Airflow 2.0 vs Prefect
    1 project | reddit.com/r/dataengineering | 20 Oct 2021
    It has been such a pleasure to use dagster. The testability is nice. It was designed to be type aware, so you can leverage type checks and it is also designed to be data aware when it comes to passing data between tasks. One negative I dont like is its handling of instances where a task does not produce output, but need to still indicate dependency of another task, so you utilize its Nothing abstraction. The syntax for this situation is awkward IMO and they've recognized that. Its UI called dagit is hands down, the best as it provides rich information on each task in your DAG. The developer experience is definitely better with dagster compared to Airflow. I briefly looked at Airflow 2.0 examples, and I still think dagster's API is better ( with version 0.13.x ). However, on the managed environment side, there is no 3rd party managed dagster provider other than the creator of dagster called Elementl has their cloud offering which is currently in beta. So there is no mature managed services for dagster yet. Again, this is due to dagster being a relatively new library - less than 3 years old.
  • MLOps project based template
    4 projects | reddit.com/r/mlops | 11 Oct 2021
    Data Pipeline - Dagster
  • Runflow - define and run workflows using HCL2
    1 project | reddit.com/r/datascience | 24 Jul 2021
    I feel like dagster is a hidden gem that is useful for a more broader user base of Python data personas as it is cross platform and so all of its key features (scheduler and web UI) work on Windows OS, unlike other major workflow orchestration frameworks. Airflow? Nope. Prefect? Nope. They effectively ignored all the small fry data folks in the corporate Windows world who are still critical to their respective organization. So even a data analyst with Python coding experience can become immediately productive using dagster. Its web UI is optional and has APIs to execute with just Python API, CLI API, or web UI. So you are not forced to execute in one way. It is designed to be very general purpose and not domain specific which I think is a good thing as it means it can be used in a variety of use cases.
  • New to data orchestration? Start here.
    2 projects | dev.to | 2 Jun 2021
    Second-generation data orchestration tools like Dagster and Prefect are more focused on being data-driven. They’re able to detect the kinds of data within DAGs and improve data awareness by anticipating the actions triggered by each data type.
  • Is Airflow a passé? What replaces it?
    2 projects | reddit.com/r/dataengineering | 10 May 2021
    There's Prefect and Dagster as up and comers in the space.
  • Scheduling tools for ETL and ML flow
    3 projects | reddit.com/r/dataengineering | 7 May 2021
    I would give dagster a look. It has a built-in native scheduler and is cross-platform. It is general purpose, so your team can grow with it and tackle broader set of use cases if needed. If you struggle to get started after reading their docs/tutorials, you can take a look at my personal repo. Ive gotten a few feedback that my example has been very useful in getting started. I know they revamped their docs recently, but havent looked at their tutorial again or looked to see if they provided an intermediate level full example yet, so I need to get back in there to see.
  • API versioning has no “right way” (2017)
    2 projects | news.ycombinator.com | 26 Apr 2021
    Versioning is indeed a hard topic, especially for data science/engineering projects in production.

    When you have a pipleine defined as complex DAG of operations, you can't just version the entire thing, unless you have enough resources to re-compute from scratch with every change, which is wasteful. So then, you have to keep track of data dependencies and their versions if you would like to ensure reproducibility.

    Versioning code isn't enough when you have runtime parameters that affect output data, and you want to stay flexible by allowing experimenting and re-running computations with different parameters, to be able to iterate quickly. Which poses a lot of challenges.

    And there doesn't seem to be a framework that solves those issues out of the box. I'm closely watching closely Dagster (https://dagster.io), as they seem to be aware of those challenges (for example for versioning: https://docs.dagster.io/guides/dagster/memoization), but I didn't try it yet; it introduces a lot of concepts and has a steep learning curve.

  • Best technologies for a beginner DE.
    1 project | reddit.com/r/dataengineering | 21 Apr 2021
    dagster!
  • Hi, how can I do pipeline automation?
    2 projects | reddit.com/r/learnpython | 18 Apr 2021
    If you are just starting out or new to doing automation, I would look at just python scripts executed with CRON if on Linux/Mac or Windows Task Scheduler if on Windows. But you'll need bash (Linux/Mac) knowledge or DOS/batch knowledge (Windows). Then graduate to using frameworks. Since you didnt specify what types of jobs you want to automate, for general purpose needs, I would look at a class of frameworks called task orchestration frameworks or workflow management libraries. I would highly recommend dagster as it comes with a native scheduler so you would be free from having to use CRON or Windows Task Scheduler. Other options include prefect, but if you want its other features like its scheduler and web GUI, you'll have to mess with docker. That's what's nice about dagster, it all works out of the box without need for non-Python dependencies.
  • Open source contributions for a Data Engineer?
    17 projects | reddit.com/r/dataengineering | 16 Apr 2021
    It's a near crime that Dagster hasn't been mentioned already.

What are some alternatives?

When comparing Airflow and dagster you can also consider the following projects:

Kedro - A Python framework for creating reproducible, maintainable and modular data science code.

Prefect - The easiest way to automate your data

luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Dask - Parallel computing with task scheduling

Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

airbyte - Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Pinball

Numba - NumPy aware dynamic Python compiler using LLVM

meltano

Poetry - Python dependency management and packaging made easy.

n8n - Free and open fair-code licensed node based Workflow Automation Tool. Easily automate tasks across different services.