Python Pipeline

Open-source Python projects categorized as Pipeline

Top 23 Python Pipeline Projects

  1. serve

    ☁️ Build multimodal AI applications with cloud-native stack

  2. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  3. Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

    Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

    - https://github.com/PrefectHQ/prefect

  4. airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte | news.ycombinator.com | 2025-08-15
  5. Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20

    GitHub: https://github.com/Avaiga/taipy

  6. marimo

    Transform data, train models, and run SQL with marimo — feels like a next-gen reactive notebook, stored as Git-friendly Python. Deploy as scripts, pipelines, endpoints, and apps. All from an AI-native editor (or your own).

    Project mention: Show HN: OverType – A Markdown WYSIWYG editor that's just a textarea | news.ycombinator.com | 2025-08-17

    I thought it should be extremely portable ("everything just works, it's native"), but it doesn't work on iOS 9.3.6. It doesn't even let me input text into the textarea...

    A natural extension seems to be a source code editor with syntax highlighting, like those used in https://marimo.io/, Jupyter, https://plutojl.org/ and other notebook-like Web editors.

  7. great_expectations

    Always know what to expect from your data.

    Project mention: validatelite VS great_expectations - a user suggested alternative | libhunt.com/r/validatelite | 2025-08-08

    Great Expectations is a popular open-source data validation framework with rich features and integrations, but it has a steeper learning curve and heavier setup. ValidateLite offers a lightweight, zero-config CLI alternative for quick checks and automation.

  8. Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

    Project mention: Don't Know These 6 Tools? No Wonder Your Python Development Is So Slow | dev.to | 2025-07-10

    👉 https://kedro.org/

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: Wk 3 Orchestration: MLOPs with DataTalks | dev.to | 2025-02-22

    Here, we use the free Mage Ai orchestration tool.

  11. papermill

    📚 Parameterize, execute, and analyze notebooks

    Project mention: Jupyter Notebooks as E2E Tests | news.ycombinator.com | 2024-12-18
  12. AutoRAG

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    Project mention: AIM Weekly 28 Oct 2024 | dev.to | 2024-10-28

    📎 AutoRAG with Milvus 🛠️ ADO 🫶 Self Hosting LLM 🌐 Noema Declarative AI 📝 New NIM Blueprint for building AI Virtual Assistant 🚙 Zilliz Integrations 🫶 Using Milvus for Semantic Search 🤖 Contextual Retrieval 📎 Meta: Quantized Light Weight Models 🚙 https://arxiv.org/pdf/2407.01219 ✅ Cool Icons 🙌 IBM Watson AI Milvus Bot 📎 The Hacker's Browser 🛠️ Small and Mighty H2O Model 📝 Zilliz Cloud vs Qdrant 💫 Gravatino and Agents 🛠️ OSS Summit Europe 2024 Report ▶️ RAG Strategi 🤖 MS AI Data Visualizations 🌐 Graph RAG 👽 South Bay Meetup 15 Oct 2024 🦾 Influx and Milvus 👽 Multimodal Pipelines ✨ Constrained Sampling from LLM 🚕 BAML: Cheaper, Fast and More Accurate Function Calling 📊 Infinite World Generation with outlines txt 💻 Ollama Client Swift 🍔 Atomic Agents 🕶️ PYMUPDF4LLM 🚕 Milvus for AI Agents 📊 Fine Tuning LLAMA 3 with ORPO 🦾 Run NVIDIA Models 💻 LLM Training Meta Lingua ✨ 1 Bit LLM - MS BitNet 💻 Intro 🕶️ Mastering Chunk 📊 Storm Stanford Tool 🐍 DAMO NLP SG CaRing 🍔 LLM Reasoners

  13. pipelines

    Machine Learning Pipelines for Kubeflow

  14. towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  15. PyFunctional

    Python library for creating data pipelines with chain functional programming

  16. instill-core

    🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications

    Project mention: Revolutionizing Unstructured Data: Instill Core – Your All-in-One AI Solution | dev.to | 2025-07-14

    View the Project on GitHub

  17. mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  18. galaxy

    Data intensive science for everyone.

  19. pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

  20. MLBox

    MLBox is a powerful Automated Machine Learning python library.

  21. sematic

    An open-source ML pipeline development platform

  22. toil

    A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

  23. NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  24. aws-lambda-handler-cookbook

    This repository provides a working, deployable, open source-based, serverless service blueprint with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.

    Project mention: Protect Your API Gateway with AWS WAF using CDK | dev.to | 2024-12-15

    The ‘orders’ service allows users to order products. We will use my open-source Serverless template project: AWS Lambda Handler Cookbook.

  25. koheesio

    Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Pipeline discussion

Log in or Post with

Python Pipeline related posts

  • Migrate connectors from MIT to ELv2 – Pull Request #63723 – airbytehq/airbyte

    1 project | news.ycombinator.com | 15 Aug 2025
  • It's 2025: Your Python Toolbox Is More Than Just PyCharm

    3 projects | dev.to | 31 Jul 2025
  • A Python-first data lakehouse

    1 project | news.ycombinator.com | 21 Jun 2025
  • Personal Picks: Data Product News (April 16, 2025)

    1 project | dev.to | 15 Apr 2025
  • airbyte VS cocoindex - a user suggested alternative

    2 projects | 1 Apr 2025
  • Top 17 DevOps AI Tools [2025]

    4 projects | dev.to | 12 Mar 2025
  • Can AI finally generate best practice code? I think so.

    2 projects | dev.to | 19 Dec 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 1 Sep 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Pipeline projects in Python? This list will help you:

# Project Stars
1 serve 21,710
2 Prefect 20,223
3 airbyte 19,349
4 Taipy 18,572
5 marimo 15,620
6 great_expectations 10,674
7 Kedro 10,505
8 Mage 8,454
9 papermill 6,252
10 AutoRAG 4,216
11 pipelines 3,923
12 towhee 3,413
13 PyFunctional 2,468
14 instill-core 2,282
15 mara-pipelines 2,080
16 galaxy 1,589
17 pytorch-toolbelt 1,556
18 MLBox 1,520
19 sematic 996
20 toil 915
21 NeumAI 859
22 aws-lambda-handler-cookbook 644
23 koheesio 640

Sponsored
Deploy and host your apps and databases, now with $50 credit!
Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
sevalla.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?