SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Pipeline Projects
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02
- https://github.com/PrefectHQ/prefect
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Project mention: Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte | news.ycombinator.com | 2025-08-15 -
Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20
GitHub: https://github.com/Avaiga/taipy
-
marimo
Transform data, train models, and run SQL with marimo — feels like a next-gen reactive notebook, stored as Git-friendly Python. Deploy as scripts, pipelines, endpoints, and apps. All from an AI-native editor (or your own).
Project mention: Show HN: OverType – A Markdown WYSIWYG editor that's just a textarea | news.ycombinator.com | 2025-08-17I thought it should be extremely portable ("everything just works, it's native"), but it doesn't work on iOS 9.3.6. It doesn't even let me input text into the textarea...
A natural extension seems to be a source code editor with syntax highlighting, like those used in https://marimo.io/, Jupyter, https://plutojl.org/ and other notebook-like Web editors.
-
Project mention: validatelite VS great_expectations - a user suggested alternative | libhunt.com/r/validatelite | 2025-08-08
Great Expectations is a popular open-source data validation framework with rich features and integrations, but it has a steeper learning curve and heavier setup. ValidateLite offers a lightweight, zero-config CLI alternative for quick checks and automation.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
Project mention: Don't Know These 6 Tools? No Wonder Your Python Development Is So Slow | dev.to | 2025-07-10👉 https://kedro.org/
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Here, we use the free Mage Ai orchestration tool.
-
-
AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
📎 AutoRAG with Milvus 🛠️ ADO 🫶 Self Hosting LLM 🌐 Noema Declarative AI 📝 New NIM Blueprint for building AI Virtual Assistant 🚙 Zilliz Integrations 🫶 Using Milvus for Semantic Search 🤖 Contextual Retrieval 📎 Meta: Quantized Light Weight Models 🚙 https://arxiv.org/pdf/2407.01219 ✅ Cool Icons 🙌 IBM Watson AI Milvus Bot 📎 The Hacker's Browser 🛠️ Small and Mighty H2O Model 📝 Zilliz Cloud vs Qdrant 💫 Gravatino and Agents 🛠️ OSS Summit Europe 2024 Report ▶️ RAG Strategi 🤖 MS AI Data Visualizations 🌐 Graph RAG 👽 South Bay Meetup 15 Oct 2024 🦾 Influx and Milvus 👽 Multimodal Pipelines ✨ Constrained Sampling from LLM 🚕 BAML: Cheaper, Fast and More Accurate Function Calling 📊 Infinite World Generation with outlines txt 💻 Ollama Client Swift 🍔 Atomic Agents 🕶️ PYMUPDF4LLM 🚕 Milvus for AI Agents 📊 Fine Tuning LLAMA 3 with ORPO 🦾 Run NVIDIA Models 💻 LLM Training Meta Lingua ✨ 1 Bit LLM - MS BitNet 💻 Intro 🕶️ Mastering Chunk 📊 Storm Stanford Tool 🐍 DAMO NLP SG CaRing 🍔 LLM Reasoners
-
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
-
instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Project mention: Revolutionizing Unstructured Data: Instill Core – Your All-in-One AI Solution | dev.to | 2025-07-14View the Project on GitHub
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
-
-
-
-
toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
aws-lambda-handler-cookbook
This repository provides a working, deployable, open source-based, serverless service blueprint with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.
The ‘orders’ service allows users to order products. We will use my open-source Serverless template project: AWS Lambda Handler Cookbook.
-
koheesio
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Pipeline discussion
Python Pipeline related posts
-
Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte
-
It's 2025: Your Python Toolbox Is More Than Just PyCharm
-
A Python-first data lakehouse
-
Personal Picks: Data Product News (April 16, 2025)
-
airbyte VS cocoindex - a user suggested alternative
2 projects | 1 Apr 2025 -
Top 17 DevOps AI Tools [2025]
-
Can AI finally generate best practice code? I think so.
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 Sep 2025
Index
What are some of the best open-source Pipeline projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | serve | 21,710 |
2 | Prefect | 20,223 |
3 | airbyte | 19,349 |
4 | Taipy | 18,572 |
5 | marimo | 15,620 |
6 | great_expectations | 10,674 |
7 | Kedro | 10,505 |
8 | Mage | 8,454 |
9 | papermill | 6,252 |
10 | AutoRAG | 4,216 |
11 | pipelines | 3,923 |
12 | towhee | 3,413 |
13 | PyFunctional | 2,468 |
14 | instill-core | 2,282 |
15 | mara-pipelines | 2,080 |
16 | galaxy | 1,589 |
17 | pytorch-toolbelt | 1,556 |
18 | MLBox | 1,520 |
19 | sematic | 996 |
20 | toil | 915 |
21 | NeumAI | 859 |
22 | aws-lambda-handler-cookbook | 644 |
23 | koheesio | 640 |