Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I understand it is subjective. But I use a forked version of https://github.com/puckel/docker-airflow on our managed K8s cluster and it points to a cloud managed Postgres. It has worked pretty well for over 3 years with no-one actually managing it from an infra POV. YMMV. This is driving a product whose ARR is well in the 100s of Millions.
If you have simple needs that are more or less set, I agree Airflow is overkill and a simple Jenkins instance is all you need.
I like this post, because in many ways it highlights the importance of how Airflow has helped shape the modern data stack.
Like mentioned in this thread, managing Airflow can quickly become complicated. Its flexibility means that you can stretch Airflow in pretty interesting ways. Especially when trying to pair container orchestrators like k8s with it.
To combat that complexity and reduce the operational burden of letting a data team create & deploy batch processing pipelines we created https://github.com/orchest/orchest
We suspect that many standardized use cases (like reverse ETL) will start disappearing from custom batch pipelines. But there’s a long tail of data processing tasks for which having freedom to invoke your language of choice has significant advantages. Not to mention stimulating innovative ideas (why not use Julia for one of your processing steps?).
Well written. I think that airflow is being enforced in organizations as the main orchestrator even though it's not always the right too for the job. In addition, organizations has to enforce a micro-services approach to have modular components. Besides that managing those frameworks is a nightmare. We built Ploomber (https://github.com/ploomber/ploomber) specifically for this reason, modular components and easy deployments. It standardize your pipelines and allows you to deploy seamlessly on Airflow, Argo (Kubernetes), Kubeflow and cloud providers.