Python Pipeline

Open-source Python projects categorized as Pipeline

Top 23 Python Pipeline Projects

  • jina

    🔮 Build multimodal AI services via cloud native technologies

    Project mention: Cross data type search that wasn’t supported well using Elasticsearch | /r/learnprogramming | 2023-04-11

    Jina mainly because of their use of neural networks and AI.

  • Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

    Project mention: self hosted Alternative to | /r/selfhosted | 2022-12-30
  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • airbyte

    Data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes.

    Project mention: airbyte VS cloudquery - a user suggested alternative | | 2023-06-02
  • great_expectations

    Always know what to expect from your data.

    Project mention: Data Quality at Scale with Great Expectations, Spark, and Airflow on EMR | | 2023-04-24

    Great Expectations (GE) is an open-source data validation tool that helps ensure data quality.

  • Kedro

    A Python framework for creating reproducible, maintainable and modular data science code.

    Project mention: A Polars exploration into Kedro | | 2023-05-17

    # pyproject.toml [project] dependencies = [ "kedro @ git+[email protected]", "kedro-datasets[pandas.CSVDataSet,polars.CSVDataSet] @ git+[email protected]#subdirectory=kedro-datasets", ]

  • papermill

    📚 Parameterize, execute, and analyze notebooks

    Project mention: Show HN: Mercury – convert Jupyter Notebooks to Web Apps without code rewriting | | 2023-06-02

    I'm using Papermill to operationalize Notebooks (, it e.g. also has airflow support. I'm really happy with papermill for automatic notebook execution, in my field it's nice that we can go very quickly from analysis to operations -- while having super transparent "logging" in the executed notebooks.

  • pipelines

    Machine Learning Pipelines for Kubeflow

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    Project mention: Welcome to generate your embeddings with Towhee | | 2023-04-20
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

  • pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

    Project mention: Surrender cards are being distributed to Russianforces: 'Your ticket to a peaceful life. Show this card to a Ukrainian soldier - it will save your life and help you get back home' On the back: a telegram chat & phone number they can contact 'to receive detailed support'. | /r/ukraine | 2022-09-12

    For what it's worth, the QR code points to a URI shortener domain "" which redirects to a cloudflare hosted page that contains telegram information relating to "nikolay_bodenko" and "chaos_admin" ("For advertising and cooperation"? Whatever that is supposed to mean), which led me to "rf200_now" and "rf200_nooow" (looks to get spammed around a bit), which is purportedly supportive of Ukraine.

  • galaxy

    Data intensive science for everyone.

    Project mention: BIOINFORMATICS PROJECT | /r/bioinformatics | 2022-10-16
  • toil

    A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

  • pypyr automation task runner

    pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

  • whispers

    Identify hardcoded secrets in static structured text (by Skyscanner)

  • bodywork

    ML pipeline orchestration and model deployments on Kubernetes, made really easy.

  • versatile-data-kit

    Build, run and manage your data pipelines with Python or SQL on any cloud

    Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project:

  • karton

    Distributed malware processing framework based on Python, Redis and S3.

  • fluids

    Fluid dynamics component of Chemical Engineering Design Library (ChEDL)

    Project mention: AbaCalc - Calculator for engineers, students & technicians | /r/react | 2022-08-02

    I’m doing similar things using an api with some Python tools in the background to do the heavy lifting with units and thermo/chemical calcs using the Caleb Bell’s libraries for thermo and fluids

  • aws-lambda-handler-cookbook

    This repository provides a working, deployable, open source based, AWS Lambda handler and CDK Python code. This handler embodies Serverless best practices and has all the bells and whistles for a proper production ready handler.

    Project mention: AWS Lambda Cookbook — Elevate your handler’s code — Part 4 — Environment Variables | | 2023-04-03

    This AWS CDK code defines the variables of the schema ‘MyHandlerEnvVars’ and sets their values. Look specifically at ‘__add_get_lambda_integration’ function.

  • forte

    Forte is a flexible and powerful ML workflow builder. This is part of the CASL project:

  • pipeline-live

    Pipeline Extension for Live Trading

    Project mention: How would I start to recode an old python library only compatible with an older version of python (3.6) to become compatible with a newer python version (3.8+)? | /r/learnpython | 2022-11-06
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-06-02.

Python Pipeline related posts


What are some of the best open-source Pipeline projects in Python? This list will help you:

Project Stars
1 jina 18,477
2 Prefect 12,078
3 airbyte 10,796
4 great_expectations 8,410
5 Kedro 8,408
6 papermill 5,251
7 pipelines 3,205
8 towhee 2,250
9 PyFunctional 2,180
10 mara-pipelines 2,005
11 MLBox 1,425
12 pytorch-toolbelt 1,395
13 galaxy 1,109
14 toil 839
15 pypyr automation task runner 523
16 whispers 436
17 bodywork 421
18 versatile-data-kit 338
19 karton 320
20 fluids 275
21 aws-lambda-handler-cookbook 259
22 forte 220
23 pipeline-live 199
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives