-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Each pipeline node represents a task in the DAG and is executed in Apache Airflow with the help of a custom NotebookOp operator. The operator also performs pre- and post processing operations, that, for example, make it possible to share data between multiple tasks using shared cloud storage.
The orchestration flow editor, which is shown in the screen capture below, is based on the Elyra canvas open source project.
Without going into details how, you could implement this workflow using the generic PythonVirtualEnvOperator to run the Python script and the special purpose PapermillOperator to run the notebook. If the latter doesn't meet your needs, you'd have to implement its functionality yourself by developing code that performs custom pre-processing, uses the papermill Python package to run the notebook, and performs custom post-processing.
Apache Airflow is an open source workflow management platform that allows for programmatic creation, scheduling, and monitoring of workflows. In Apache Airflow a workflow (or pipeline) is called a Directed Acyclic Graph (DAG) that comprises of one or more related tasks. A task represents a unit of work (such as the execution of a Python script) and is implemented by an operator. Airflow comes with built-in operators, such as the PythonOperator, and can be extended using provider packages.
Related posts
-
Introducing Elyra pipelines with custom component support
-
Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions
-
Show HN: Marimo – an open-source reactive notebook for Python
-
Navigating Week Two: Insights and Experiences from My Tublian Internship Journey
-
Airflow VS quix-streams - a user suggested alternative
2 projects | 7 Dec 2023