Python Pipeline

Open-source Python projects categorized as Pipeline | Edit details

Top 17 Python Pipeline Projects

  • GitHub repo great_expectations

    Always know what to expect from your data.

    Project mention: Just starting to get into automated testing, should I be looking for a dedicated tool or library for data engineering specifically? | | 2021-05-14

    You can also extend existing framework/s if necessary. For example, both pytest and Great Expectations are extensible.

  • GitHub repo papermill

    📚 Parameterize, execute, and analyze notebooks

    Project mention: Team with no data science infrastructure/knowledge (crawl/walk/run) | | 2021-06-03

    You can look into papermill for example

  • GitHub repo Kedro

    A Python framework for creating reproducible, maintainable and modular data science code.

    Project mention: I Started Streaming on Twitch | | 2021-06-12

    It all started with kedro/issues/606, Yetu called out for users of kedro to record themselves doing a walk through of their tutorials. I wanted to do this, but was really stuck at the fact that recording or editing somewhat polished vide is quite time consuming for me.

  • GitHub repo PyFunctional

    Python library for creating data pipelines with chain functional programming

    Project mention: PyFunctional makes creating data pipelines easy by using chained functional operators | | 2021-03-31
  • GitHub repo mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

    Project mention: Using PostgreSQL as a Data Warehouse | | 2021-05-10

    The tooling behind the approach has been built as a set of python package named Mara. It is available at GitHub:

    And additional packages can be found at the Mara org:

  • GitHub repo galaxy

    Data intensive science for everyone. (by galaxyproject)

    Project mention: Developed a new kind of dual extruder system on fully custom built 3D printer | | 2021-03-01
  • GitHub repo pdpipe

    Easy pipelines for pandas DataFrames.

  • GitHub repo bodywork

    MLOps tool for deploying machine learning projects to Kubernetes.

    Project mention: Deployment automation for ML projects of all shapes and sizes | | 2021-06-09
  • GitHub repo pypyr automation task runner

    pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

  • GitHub repo forte

    Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project:

    Project mention: Building Modular and Re-purposable NLP Pipelines | | 2021-03-02

    Introducing Forte, from the CASL open-source project at Petuum. Forte combines multiple NLP tools to construct an entire NLP pipeline with a few lines of python and extend them to different domains.

  • GitHub repo pierogis

    image and animation processing framework

    Project mention: pierogis/pierogis a framework for image and animation processing | | 2021-02-22
  • GitHub repo cool

    Make Python code cooler. Less is more. (by abersheeran)

    Project mention: Simple, efficient and pure Python implementation of Python pipeline operations | | 2021-05-17
  • GitHub repo alkymi

    Pythonic task automation

    Project mention: Alkymi – Data/Task Automation in Python | | 2021-03-23
  • GitHub repo spline

    Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones. (by Nachtfeuer)

  • GitHub repo abcd-hcp-pipeline

    bids application for processing functional MRI data, robust to scanner, acquisition and age variability.

    Project mention: Siemens output from ABCD T1 and T2 sequences. | | 2021-02-08

    Who provided the sequence? They're usually the point of contact for this kind of question. Alternatively, you can bug one of the processing groups for ABCD (link, and they might point you in the right direction. A shot of getting one of the ABCD or ABIDE/HCP sequence designers to see this on reddit is unlikley, but good luck.

  • GitHub repo magda

    Library for building Modular and Asynchronous Graphs with Directed and Acyclic edges (MAGDA)

    Project mention: MAGDA – our open-source solution for spaghetti code | | 2021-04-14

    We would like to introduce you to our latest open-source library: MAGDA. The name is an abbreviation for “Modular Asynchronous Graphs with Directed and Acyclic edges”, which fully describes the idea behind it. The library enables building modular data pipelines with asynchronous processing in e.g. machine learning and data science projects. It is dedicated for Python projects and is available on the NeuroSYS GitHub, as well as on the PyPI repository. It aids our R&D teams not only by introducing some abstraction (classes and functions) but also by imposing an architectural pattern onto the project.

  • GitHub repo raiven

    A framework for the translation of AI tools to the radiology environment

    Project mention: Any interesting open projects to join? Or anyone want with some good ideas want to start one? | | 2021-02-05
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-06-12.


What are some of the best open-source Pipeline projects in Python? This list will help you:

Project Stars
1 great_expectations 4,721
2 papermill 4,239
3 Kedro 4,103
4 PyFunctional 1,878
5 mara-pipelines 1,744
6 galaxy 853
7 pdpipe 609
8 bodywork 258
9 pypyr automation task runner 189
10 forte 120
11 pierogis 107
12 cool 87
13 alkymi 38
14 spline 29
15 abcd-hcp-pipeline 15
16 magda 10
17 raiven 9