SaaSHub helps you find the best software and product alternatives Learn more →
Top 13 Python apache-airflow Projects
-
For senior engineers building custom job data visualization pipelines, the single biggest latency gain comes from pre-aggregating frequently accessed metrics instead of running joins at query time. In our benchmarks, querying raw job_postings tables with 1M rows took 210ms average, while pre-aggregated tables (updated hourly via PostgreSQL materialized views) reduced query time to 12ms. Use tools like Apache Airflow 2.7.3 to schedule materialized view refreshes during off-peak hours. For example, a materialized view for average salary by company can be defined as:
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
astronomer-cosmos
Run your dbt Core or dbt Fusion projects as Apache Airflow DAGs and Task Groups with a few lines of code
-
couler
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.
-
ethereum-etl-airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
-
Project mention: Agent Skills for Data Engineering (Airflow, Dbt, Analytics) | news.ycombinator.com | 2026-02-25
-
-
MCP-Airflow-API
⚡ Control Apache Airflow with natural language via MCP. Chat with your workflows using Claude, GPT, or any LLM — no REST API calls needed. Supports Airflow 2.x (43 tools) & 3.0+ (45+ tools).
Project mention: Show HN: MCP-Server for Control Airflow Cluster | news.ycombinator.com | 2025-08-16 -
covid-19-data-engineering-pipeline
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
-
e2e-structured-streaming
End-to-end data pipeline that ingests, processes, and stores data. It uses Apache Airflow to schedule scripts that fetch data from an API, sends the data to Kafka, and processes it with Spark before writing to Cassandra. The pipeline, built with Python and Apache Zookeeper, is containerized with Docker for easy deployment and scalability.
-
-
twitter_data-lakehouse_minio_drill_superset
Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset
Python apache-airflow discussion
Python apache-airflow related posts
-
Airflow Registry
-
Introduction to Apache Airflow
-
Personal Picks: Data Product News (October 1, 2025)
-
Top ETL Tools for MongoDB in 2025: Which One Fits Your Use Case?
-
Apache Airflow
-
Building Effective AI Agents \ Anthropic
-
AI Is Spamming Open Source Repos with Fake Issues
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 Jun 2026
Index
What are some of the best open-source apache-airflow projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Airflow | 45,711 |
| 2 | elyra | 1,993 |
| 3 | airflow-maintenance-dags | 1,770 |
| 4 | astronomer-cosmos | 1,214 |
| 5 | couler | 941 |
| 6 | ethereum-etl-airflow | 441 |
| 7 | agents | 381 |
| 8 | airflow-chart | 297 |
| 9 | MCP-Airflow-API | 48 |
| 10 | covid-19-data-engineering-pipeline | 24 |
| 11 | e2e-structured-streaming | 21 |
| 12 | F2-Data-Pipeline | 10 |
| 13 | twitter_data-lakehouse_minio_drill_superset | 5 |