Top 23 Python Pipeline Projects

serve

1 126 21,710 8.8 Python

☁️ Build multimodal AI applications with cloud-native stack
Sevalla

sevalla.com featured

Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
Prefect

2 20 20,223 9.9 Python

The easiest way to build, run, and monitor data pipelines at scale.

Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

- https://github.com/PrefectHQ/prefect
airbyte

3 157 19,349 10.0 Python

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Project mention: Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte | news.ycombinator.com | 2025-08-15
Taipy

4 24 18,572 9.6 Python

Turns Data and AI algorithms into production-ready web applications in no time.

Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20

GitHub: https://github.com/Avaiga/taipy
marimo

5 41 15,620 10.0 Python

Transform data, train models, and run SQL with marimo — feels like a next-gen reactive notebook, stored as Git-friendly Python. Deploy as scripts, pipelines, endpoints, and apps. All from an AI-native editor (or your own).

Project mention: Show HN: OverType – A Markdown WYSIWYG editor that's just a textarea | news.ycombinator.com | 2025-08-17

I thought it should be extremely portable ("everything just works, it's native"), but it doesn't work on iOS 9.3.6. It doesn't even let me input text into the textarea...
A natural extension seems to be a source code editor with syntax highlighting, like those used in https://marimo.io/, Jupyter, https://plutojl.org/ and other notebook-like Web editors.
great_expectations

6 16 10,674 9.8 Python

Always know what to expect from your data.

Project mention: validatelite VS great_expectations - a user suggested alternative | libhunt.com/r/validatelite | 2025-08-08

Great Expectations is a popular open-source data validation framework with rich features and integrations, but it has a steeper learning curve and heavier setup. ValidateLite offers a lightweight, zero-config CLI alternative for quick checks and automation.
Kedro

7 35 10,505 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Project mention: Don't Know These 6 Tools? No Wonder Your Python Development Is So Slow | dev.to | 2025-07-10

👉 https://kedro.org/
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
Mage

8 79 8,454 9.4 Python

🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

Project mention: Wk 3 Orchestration: MLOPs with DataTalks | dev.to | 2025-02-22

Here, we use the free Mage Ai orchestration tool.
papermill

9 30 6,252 1.6 Python

📚 Parameterize, execute, and analyze notebooks

Project mention: Jupyter Notebooks as E2E Tests | news.ycombinator.com | 2024-12-18
AutoRAG

10 5 4,216 8.7 Python

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

Project mention: AIM Weekly 28 Oct 2024 | dev.to | 2024-10-28

📎 AutoRAG with Milvus 🛠️ ADO 🫶 Self Hosting LLM 🌐 Noema Declarative AI 📝 New NIM Blueprint for building AI Virtual Assistant 🚙 Zilliz Integrations 🫶 Using Milvus for Semantic Search 🤖 Contextual Retrieval 📎 Meta: Quantized Light Weight Models 🚙 https://arxiv.org/pdf/2407.01219 ✅ Cool Icons 🙌 IBM Watson AI Milvus Bot 📎 The Hacker's Browser 🛠️ Small and Mighty H2O Model 📝 Zilliz Cloud vs Qdrant 💫 Gravatino and Agents 🛠️ OSS Summit Europe 2024 Report ▶️ RAG Strategi 🤖 MS AI Data Visualizations 🌐 Graph RAG 👽 South Bay Meetup 15 Oct 2024 🦾 Influx and Milvus 👽 Multimodal Pipelines ✨ Constrained Sampling from LLM 🚕 BAML: Cheaper, Fast and More Accurate Function Calling 📊 Infinite World Generation with outlines txt 💻 Ollama Client Swift 🍔 Atomic Agents 🕶️ PYMUPDF4LLM 🚕 Milvus for AI Agents 📊 Fine Tuning LLAMA 3 with ORPO 🦾 Run NVIDIA Models 💻 LLM Training Meta Lingua ✨ 1 Bit LLM - MS BitNet 💻 Intro 🕶️ Mastering Chunk 📊 Storm Stanford Tool 🐍 DAMO NLP SG CaRing 🍔 LLM Reasoners
pipelines

11 2 3,923 9.7 Python

Machine Learning Pipelines for Kubeflow
towhee

12 26 3,413 6.2 Python

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
PyFunctional

13 4 2,468 5.5 Python

Python library for creating data pipelines with chain functional programming
instill-core

14 3 2,282 9.5 Python

🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications

Project mention: Revolutionizing Unstructured Data: Instill Core – Your All-in-One AI Solution | dev.to | 2025-07-14

View the Project on GitHub
mara-pipelines

15 3 2,080 6.0 Python

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
galaxy

16 4 1,589 10.0 Python

Data intensive science for everyone.
pytorch-toolbelt

17 1 1,556 7.2 Python

PyTorch extensions for fast R&D prototyping and Kaggle farming
MLBox

18 1 1,520 0.0 Python

MLBox is a powerful Automated Machine Learning python library.
sematic

19 4 996 7.3 Python

An open-source ML pipeline development platform
toil

20 2 915 9.1 Python

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
NeumAI

21 2 859 8.7 Python

Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
aws-lambda-handler-cookbook

22 17 644 8.0 Python

This repository provides a working, deployable, open source-based, serverless service blueprint with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.

Project mention: Protect Your API Gateway with AWS WAF using CDK | dev.to | 2024-12-15

The ‘orders’ service allows users to order products. We will use my open-source Serverless template project: AWS Lambda Handler Cookbook.
koheesio

23 3 640 8.0 Python

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Pipeline discussion

Python Pipeline related posts

Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte

1 project | news.ycombinator.com | 15 Aug 2025
It's 2025: Your Python Toolbox Is More Than Just PyCharm

3 projects | dev.to | 31 Jul 2025
A Python-first data lakehouse

1 project | news.ycombinator.com | 21 Jun 2025
Personal Picks: Data Product News (April 16, 2025)

1 project | dev.to | 15 Apr 2025
airbyte VS cocoindex - a user suggested alternative

2 projects | 1 Apr 2025
Top 17 DevOps AI Tools [2025]

4 projects | dev.to | 12 Mar 2025
Can AI finally generate best practice code? I think so.

2 projects | dev.to | 19 Dec 2024
A note from our sponsor - SaaSHub
www.saashub.com | 1 Sep 2025

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Pipeline projects in Python? This list will help you:

#	Project	Stars
1	serve	21,710
2	Prefect	20,223
3	airbyte	19,349
4	Taipy	18,572
5	marimo	15,620
6	great_expectations	10,674
7	Kedro	10,505
8	Mage	8,454
9	papermill	6,252
10	AutoRAG	4,216
11	pipelines	3,923
12	towhee	3,413
13	PyFunctional	2,468
14	instill-core	2,282
15	mara-pipelines	2,080
16	galaxy	1,589
17	pytorch-toolbelt	1,556
18	MLBox	1,520
19	sematic	996
20	toil	915
21	NeumAI	859
22	aws-lambda-handler-cookbook	644
23	koheesio	640

Python Pipeline

Top 23 Python Pipeline Projects

Python Pipeline discussion

Python Pipeline related posts

Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte

It's 2025: Your Python Toolbox Is More Than Just PyCharm

A Python-first data lakehouse

Personal Picks: Data Product News (April 16, 2025)

airbyte VS cocoindex - a user suggested alternative

Top 17 DevOps AI Tools [2025]

Can AI finally generate best practice code? I think so.

Index

Did you know that Python is the 2nd most popular programming language based on number of references?

Did you know that Python is
the 2nd most popular programming language
based on number of references?