The Unbundling of Airflow

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • docker-airflow

    Docker Apache Airflow

  • I understand it is subjective. But I use a forked version of https://github.com/puckel/docker-airflow on our managed K8s cluster and it points to a cloud managed Postgres. It has worked pretty well for over 3 years with no-one actually managing it from an infra POV. YMMV. This is driving a product whose ARR is well in the 100s of Millions.

    If you have simple needs that are more or less set, I agree Airflow is overkill and a simple Jenkins instance is all you need.

  • orchest

    Build data pipelines, the easy way 🛠️

  • I like this post, because in many ways it highlights the importance of how Airflow has helped shape the modern data stack.

    Like mentioned in this thread, managing Airflow can quickly become complicated. Its flexibility means that you can stretch Airflow in pretty interesting ways. Especially when trying to pair container orchestrators like k8s with it.

    To combat that complexity and reduce the operational burden of letting a data team create & deploy batch processing pipelines we created https://github.com/orchest/orchest

    We suspect that many standardized use cases (like reverse ETL) will start disappearing from custom batch pipelines. But there’s a long tail of data processing tasks for which having freedom to invoke your language of choice has significant advantages. Not to mention stimulating innovative ideas (why not use Julia for one of your processing steps?).

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • ploomber

    The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

  • Well written. I think that airflow is being enforced in organizations as the main orchestrator even though it's not always the right too for the job. In addition, organizations has to enforce a micro-services approach to have modular components. Besides that managing those frameworks is a nightmare. We built Ploomber (https://github.com/ploomber/ploomber) specifically for this reason, modular components and easy deployments. It standardize your pipelines and allows you to deploy seamlessly on Airflow, Argo (Kubernetes), Kubeflow and cloud providers.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts