Python Pipeline

Open-source Python projects categorized as Pipeline

Top 23 Python Pipeline Projects

  1. serve

    ☁️ Build multimodal AI applications with cloud-native stack

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

    Project mention: Show HN: Flow – A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02

    - https://github.com/PrefectHQ/prefect

  4. Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20

    GitHub: https://github.com/Avaiga/taipy

  5. airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: Personal Picks: Data Product News (April 16, 2025) | dev.to | 2025-04-15
  6. marimo

    A reactive notebook for Python β€” run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. All in a modern, AI-native editor.

    Project mention: Atuin Desktop: Runbooks That Run | news.ycombinator.com | 2025-04-22

    linky https://github.com/marimo-team/marimo#:~:text=all%20in%20a%2... (Apache 2)

  7. great_expectations

    Always know what to expect from your data.

  8. Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

    Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13

    Kedro is an ML development framework that brings data science projects from pilot development to production by creating reproducible, maintainable, and modular data science code. Kedro has a data catalog for data handling, support pipeline building, and a standardized template for code maintainability and consistency to effectively do this. Its data catalog uses lightweight data connectors to manage and track datasets. This allows you to use the same pipeline to build multiple production-level codes across your system.

  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  10. Mage

    πŸ§™ The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: Wk 3 Orchestration: MLOPs with DataTalks | dev.to | 2025-02-22

    Here, we use the free Mage Ai orchestration tool.

  11. papermill

    πŸ“š Parameterize, execute, and analyze notebooks

    Project mention: Jupyter Notebooks as E2E Tests | news.ycombinator.com | 2024-12-18
  12. AutoRAG

    AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

    Project mention: AIM Weekly 28 Oct 2024 | dev.to | 2024-10-28

    πŸ“Ž AutoRAG with Milvus πŸ› οΈ ADO 🫢 Self Hosting LLM 🌐 Noema Declarative AI πŸ“ New NIM Blueprint for building AI Virtual Assistant πŸš™ Zilliz Integrations 🫢 Using Milvus for Semantic Search πŸ€– Contextual Retrieval πŸ“Ž Meta: Quantized Light Weight Models πŸš™ https://arxiv.org/pdf/2407.01219 βœ… Cool Icons πŸ™Œ IBM Watson AI Milvus Bot πŸ“Ž The Hacker's Browser πŸ› οΈ Small and Mighty H2O Model πŸ“ Zilliz Cloud vs Qdrant πŸ’« Gravatino and Agents πŸ› οΈ OSS Summit Europe 2024 Report ▢️ RAG Strategi πŸ€– MS AI Data Visualizations 🌐 Graph RAG πŸ‘½ South Bay Meetup 15 Oct 2024 🦾 Influx and Milvus πŸ‘½ Multimodal Pipelines ✨ Constrained Sampling from LLM πŸš• BAML: Cheaper, Fast and More Accurate Function Calling πŸ“Š Infinite World Generation with outlines txt πŸ’» Ollama Client Swift πŸ” Atomic Agents πŸ•ΆοΈ PYMUPDF4LLM πŸš• Milvus for AI Agents πŸ“Š Fine Tuning LLAMA 3 with ORPO 🦾 Run NVIDIA Models πŸ’» LLM Training Meta Lingua ✨ 1 Bit LLM - MS BitNet πŸ’» Intro πŸ•ΆοΈ Mastering Chunk πŸ“Š Storm Stanford Tool 🐍 DAMO NLP SG CaRing πŸ” LLM Reasoners

  13. pipelines

    Machine Learning Pipelines for Kubeflow

  14. towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  15. PyFunctional

    Python library for creating data pipelines with chain functional programming

  16. mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  17. pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

  18. MLBox

    MLBox is a powerful Automated Machine Learning python library.

  19. galaxy

    Data intensive science for everyone.

  20. sematic

    An open-source ML pipeline development platform

  21. toil

    A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

  22. NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  23. koheesio

    Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

    Project mention: This Week in Python | dev.to | 2024-06-07

    koheesio – framework for building efficient data pipelines

  24. pypyr automation task runner

    pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

  25. aws-lambda-handler-cookbook

    This repository provides a working, deployable, open source-based, serverless service blueprint with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.

    Project mention: Protect Your API Gateway with AWS WAF using CDK | dev.to | 2024-12-15

    The β€˜orders’ service allows users to order products. We will use my open-source Serverless template project: AWS Lambda Handler Cookbook.

  26. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Pipeline discussion

Log in or Post with

Python Pipeline related posts

  • Personal Picks: Data Product News (April 16, 2025)

    1 project | dev.to | 15 Apr 2025
  • airbyte VS cocoindex - a user suggested alternative

    2 projects | 1 Apr 2025
  • Top 17 DevOps AI Tools [2025]

    4 projects | dev.to | 12 Mar 2025
  • Can AI finally generate best practice code? I think so.

    2 projects | dev.to | 19 Dec 2024
  • Jupyter Notebooks as E2E Tests

    8 projects | news.ycombinator.com | 18 Dec 2024
  • Explorer l'API de 360Learning : de l'agilitΓ© de Power Query Γ  la robustesse de la Modern Data Stack

    1 project | dev.to | 14 Dec 2024
  • Airbyte 1.0 Released

    1 project | news.ycombinator.com | 24 Sep 2024
  • A note from our sponsor - InfluxDB
    influxdata.com | 25 Apr 2025
    Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems. Learn more β†’

Index

What are some of the best open-source Pipeline projects in Python? This list will help you:

# Project Stars
1 serve 21,523
2 Prefect 19,016
3 Taipy 17,996
4 airbyte 17,903
5 marimo 12,524
6 great_expectations 10,334
7 Kedro 10,276
8 Mage 8,264
9 papermill 6,137
10 AutoRAG 3,856
11 pipelines 3,802
12 towhee 3,359
13 PyFunctional 2,420
14 mara-pipelines 2,082
15 pytorch-toolbelt 1,537
16 MLBox 1,503
17 galaxy 1,483
18 sematic 986
19 toil 909
20 NeumAI 854
21 koheesio 635
22 pypyr automation task runner 627
23 aws-lambda-handler-cookbook 610

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com