Airflow VS dagster

Compare Airflow vs dagster and see what are their differences.

Airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows (by apache)
Our great sponsors
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • Sonar - Write Clean Python Code. Always.
  • SaaSHub - Software Alternatives and Reviews
Airflow dagster
143 39
29,004 6,364
1.5% 3.9%
10.0 10.0
about 22 hours ago 7 days ago
Python Python
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

Airflow

Posts with mentions or reviews of Airflow. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-20.
  • Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset
    4 projects | dev.to | 20 Jan 2023
    đź’ˇ You can read more here.
  • How do you manage scheduled tasks?
    4 projects | reddit.com/r/selfhosted | 10 Jan 2023
    Its a bit overkill but i use Airflow with local executor.
  • Twitter Data Pipeline with Apache Airflow + MinIO (S3 compatible Object Storage)
    5 projects | dev.to | 6 Jan 2023
    To learn more about it, I built a Data Pipeline that uses Apache Airflow to pull Elon Musk tweets using the Twitter API and store the result in a CSV stored in a MinIO (OSS alternative to AWS s3) Object Storage bucket.
  • Data Analytics at Potloc I: Making data integrity your priority with Elementary & Meltano
    4 projects | dev.to | 5 Jan 2023
    Airflow
  • self hosted Alternative to easycron.com?
    7 projects | reddit.com/r/selfhosted | 30 Dec 2022
  • Azure OAuth CSRF State Not Equal Error
    2 projects | reddit.com/r/apache_airflow | 20 Dec 2022
    I am currently having a problem with trying to enable Azure OAuth to authenticate into our airflow instance. I have posted in countless other places trying to get answers so this is my next place I am trying. Here is the link to the discussion I posted within the airflow repo: https://github.com/apache/airflow/discussions/28098 but I will also do the liberty of posting it here as well. If anybody has any knowledge or can help I would greatly appreciate it as I have been dealing with this for over a month with no answers.
  • ETL tool
    3 projects | reddit.com/r/dataengineering | 24 Nov 2022
    Airflow is really popular, started at Airbnb. Pros: huge community, super mature. Cons: generic workflow orchestration, not the best for handling only data, hard to scale and maintain.
  • How to do distributed cronjobs with worker queues?
    8 projects | reddit.com/r/golang | 12 Nov 2022
    Airflow might also be a good option for you. Essentially DAGs of cronjobs. We like it a lot.
  • Airflow :: Deploy Apache Airflow on Rancher K3s
    3 projects | dev.to | 6 Nov 2022
    $ helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace Release "airflow" does not exist. Installing it now. NAME: airflow LAST DEPLOYED: Sun Nov 6 02:06:55 2022 NAMESPACE: airflow STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: Thank you for installing Apache Airflow 2.4.1! Your release is named airflow. You can now access your dashboard(s) by executing the following command(s) and visiting the corresponding port at localhost in your browser: Airflow Webserver: kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow Default Webserver (Airflow UI) Login credentials: username: admin password: admin Default Postgres connection credentials: username: postgres password: postgres port: 5432 You can get Fernet Key value by running the following: echo Fernet Key: $(kubectl get secret --namespace airflow airflow-fernet-key -o jsonpath="{.data.fernet-key}" | base64 --decode) ########################################################### # WARNING: You should set a static webserver secret key # ########################################################### You are using a dynamically generated webserver secret key, which can lead to unnecessary restarts of your Airflow components. Information on how to set a static webserver secret key can be found here: https://airflow.apache.org/docs/helm-chart/stable/production-guide.html#webserver-secret-key
  • Duct Size vs. Airflow (2012)
    2 projects | news.ycombinator.com | 18 Oct 2022
    I gotta admit, my first thought was "Duct Size" is a weird name for a distributed work-flow tool[1].

    [1] https://airflow.apache.org/

dagster

Posts with mentions or reviews of dagster. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-23.

What are some alternatives?

When comparing Airflow and dagster you can also consider the following projects:

Prefect - The easiest way to build, run, and monitor data pipelines at scale.

Kedro - A Python framework for creating reproducible, maintainable and modular data science code.

luigi - Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

n8n - Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.

Dask - Parallel computing with task scheduling

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

airbyte - Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.

Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Apache Camel - Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

argo - Workflow engine for Kubernetes

Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing