Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems. Learn more β
Top 23 Python Pipeline Projects
-
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
Project mention: Show HN: Flow β A Dynamic Task Engine for AI Agents Without DAG | news.ycombinator.com | 2024-12-02
- https://github.com/PrefectHQ/prefect
-
Project mention: Top 40 Open-source Developer Tools with the Most GitHub Stars | dev.to | 2025-04-20
GitHub: https://github.com/Avaiga/taipy
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
marimo
A reactive notebook for Python β run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. All in a modern, AI-native editor.
linky https://github.com/marimo-team/marimo#:~:text=all%20in%20a%2... (Apache 2)
-
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13Kedro is an ML development framework that brings data science projects from pilot development to production by creating reproducible, maintainable, and modular data science code. Kedro has a data catalog for data handling, support pipeline building, and a standardized template for code maintainability and consistency to effectively do this. Its data catalog uses lightweight data connectors to manage and track datasets. This allows you to use the same pipeline to build multiple production-level codes across your system.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
Mage
π§ The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Here, we use the free Mage Ai orchestration tool.
-
-
AutoRAG
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
π AutoRAG with Milvus π οΈ ADO π«Ά Self Hosting LLM π Noema Declarative AI π New NIM Blueprint for building AI Virtual Assistant π Zilliz Integrations π«Ά Using Milvus for Semantic Search π€ Contextual Retrieval π Meta: Quantized Light Weight Models π https://arxiv.org/pdf/2407.01219 β Cool Icons π IBM Watson AI Milvus Bot π The Hacker's Browser π οΈ Small and Mighty H2O Model π Zilliz Cloud vs Qdrant π« Gravatino and Agents π οΈ OSS Summit Europe 2024 Report βΆοΈ RAG Strategi π€ MS AI Data Visualizations π Graph RAG π½ South Bay Meetup 15 Oct 2024 π¦Ύ Influx and Milvus π½ Multimodal Pipelines β¨ Constrained Sampling from LLM π BAML: Cheaper, Fast and More Accurate Function Calling π Infinite World Generation with outlines txt π» Ollama Client Swift π Atomic Agents πΆοΈ PYMUPDF4LLM π Milvus for AI Agents π Fine Tuning LLAMA 3 with ORPO π¦Ύ Run NVIDIA Models π» LLM Training Meta Lingua β¨ 1 Bit LLM - MS BitNet π» Intro πΆοΈ Mastering Chunk π Storm Stanford Tool π DAMO NLP SG CaRing π LLM Reasoners
-
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
-
-
-
-
toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
koheesio
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
koheesio β framework for building efficient data pipelines
-
pypyr automation task runner
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
-
aws-lambda-handler-cookbook
This repository provides a working, deployable, open source-based, serverless service blueprint with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.
The βordersβ service allows users to order products. We will use my open-source Serverless template project: AWS Lambda Handler Cookbook.
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
Python Pipeline discussion
Python Pipeline related posts
-
Personal Picks: Data Product News (April 16, 2025)
-
airbyte VS cocoindex - a user suggested alternative
2 projects | 1 Apr 2025 -
Top 17 DevOps AI Tools [2025]
-
Can AI finally generate best practice code? I think so.
-
Jupyter Notebooks as E2E Tests
-
Explorer l'API de 360Learning : de l'agilitΓ© de Power Query Γ la robustesse de la Modern Data Stack
-
Airbyte 1.0 Released
-
A note from our sponsor - InfluxDB
influxdata.com | 25 Apr 2025
Index
What are some of the best open-source Pipeline projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | serve | 21,523 |
2 | Prefect | 19,016 |
3 | Taipy | 17,996 |
4 | airbyte | 17,903 |
5 | marimo | 12,524 |
6 | great_expectations | 10,334 |
7 | Kedro | 10,276 |
8 | Mage | 8,264 |
9 | papermill | 6,137 |
10 | AutoRAG | 3,856 |
11 | pipelines | 3,802 |
12 | towhee | 3,359 |
13 | PyFunctional | 2,420 |
14 | mara-pipelines | 2,082 |
15 | pytorch-toolbelt | 1,537 |
16 | MLBox | 1,503 |
17 | galaxy | 1,483 |
18 | sematic | 986 |
19 | toil | 909 |
20 | NeumAI | 854 |
21 | koheesio | 635 |
22 | pypyr automation task runner | 627 |
23 | aws-lambda-handler-cookbook | 610 |