apache-airflow

Open-source projects categorized as apache-airflow

Top 16 apache-airflow Open-Source Projects

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

    Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

  • elyra

    Elyra extends JupyterLab with an AI centric approach.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • airflow-maintenance-dags

    A series of DAGs/Workflows to help maintain the operation of Airflow

  • couler

    Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

  • Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

    author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.

    it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.

    i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.

  • astronomer-cosmos

    Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code

  • ethereum-etl-airflow

    Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

  • Project mention: ethereum-etl-airflow: NEW Data - star count:358.0 | /r/algoprojects | 2023-07-10
  • astro-cli

    CLI that makes it easy to create, test and deploy Airflow DAGs to Astronomer

  • Project mention: Run Apache Airflow through Docker | /r/dataengineering | 2023-06-25
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

  • Project mention: Orchestration: Thoughts on Dagster, Airflow and Prefect? | /r/dataengineering | 2023-06-01

    Have you tried the Astro SDK? https://github.com/astronomer/astro-sdk

  • airflow-chart

    A Helm chart to install Apache Airflow on Kubernetes

  • airflowctl

    A CLI tool to streamline getting started with Apache Airflow™ and managing multiple Airflow projects

  • Project mention: A look at airflowctl, a tool to help developers manage Apache Airflow projects | dev.to | 2023-08-14

    NOTE! I found a small issue in that when you run in background mode, it creates a file (.airflowctl/.background_process_ids) which contains the parent PID. The PID was always off, so I needed to manually edit this. I have created an issue here so if this happens to you, follow that.

  • uber-expenses-tracking

    The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

  • terraform-aws-mwaa

    Terraform module for Amazon MWAA(Apache Airflow)

  • Masters-Thesis-on-Big-Data

    Master's thesis on Big Data

  • covid-19-data-engineering-pipeline

    A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

  • F2-Data-Pipeline

    Pipeline for Automated Updates of Kaggle's "Formula 2 Dataset"

  • Project mention: First End-to-End Data Engineering Project: Formula 2 Data Pipeline for for Automated Updates of a Kaggle's dataset. | /r/dataengineering | 2023-07-06

    GitHub Repository: here

  • twitter_data-lakehouse_minio_drill_superset

    Building a Data Lakehouse for Analyzing Elon Musk Tweets using MinIO, Apache Airflow, Apache Drill and Apache Superset

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

apache-airflow related posts

  • Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions

    1 project | dev.to | 12 Feb 2024
  • Navigating Week Two: Insights and Experiences from My Tublian Internship Journey

    1 project | dev.to | 31 Dec 2023
  • Best ETL Tools And Why To Choose

    1 project | /r/tactionsoftware | 11 Nov 2023
  • Simplifying Data Transformation in Redshift: An Approach with DBT and Airflow

    2 projects | dev.to | 7 Nov 2023
  • Share Your favorite python related software!

    1 project | /r/Python | 1 Oct 2023
  • "Você veio protestar para ter acesso ao código fonte da urnas. O que é o código fonte?" "Não sei" 🤡

    1 project | /r/brasil | 29 Jun 2023
  • Run Apache Airflow through Docker

    1 project | /r/dataengineering | 25 Jun 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 6 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source apache-airflow projects? This list will help you:

Project Stars
1 Airflow 34,570
2 elyra 1,776
3 airflow-maintenance-dags 1,601
4 couler 889
5 astronomer-cosmos 457
6 ethereum-etl-airflow 387
7 astro-cli 324
8 astro-sdk 319
9 airflow-chart 267
10 airflowctl 169
11 uber-expenses-tracking 94
12 terraform-aws-mwaa 33
13 Masters-Thesis-on-Big-Data 22
14 covid-19-data-engineering-pipeline 22
15 F2-Data-Pipeline 8
16 twitter_data-lakehouse_minio_drill_superset 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com