Automate your machine learning workflow tasks using Elyra and Apache Airflow

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • airflow-notebook

    Discontinued This repository is no longer maintained.

  • Each pipeline node represents a task in the DAG and is executed in Apache Airflow with the help of a custom NotebookOp operator. The operator also performs pre- and post processing operations, that, for example, make it possible to share data between multiple tasks using shared cloud storage.

  • canvas

  • The orchestration flow editor, which is shown in the screen capture below, is based on the Elyra canvas open source project.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • papermill

    📚 Parameterize, execute, and analyze notebooks

  • Without going into details how, you could implement this workflow using the generic PythonVirtualEnvOperator to run the Python script and the special purpose PapermillOperator to run the notebook. If the latter doesn't meet your needs, you'd have to implement its functionality yourself by developing code that performs custom pre-processing, uses the papermill Python package to run the notebook, and performs custom post-processing.

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Apache Airflow is an open source workflow management platform that allows for programmatic creation, scheduling, and monitoring of workflows. In Apache Airflow a workflow (or pipeline) is called a Directed Acyclic Graph (DAG) that comprises of one or more related tasks. A task represents a unit of work (such as the execution of a Python script) and is implemented by an operator. Airflow comes with built-in operators, such as the PythonOperator, and can be extended using provider packages.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Introducing Elyra pipelines with custom component support

    3 projects | dev.to | 10 Aug 2021
  • Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions

    1 project | dev.to | 12 Feb 2024
  • Show HN: Marimo – an open-source reactive notebook for Python

    13 projects | news.ycombinator.com | 12 Jan 2024
  • Navigating Week Two: Insights and Experiences from My Tublian Internship Journey

    1 project | dev.to | 31 Dec 2023
  • Airflow VS quix-streams - a user suggested alternative

    2 projects | 7 Dec 2023