SaaSHub helps you find the best software and product alternatives Learn more →
Top 17 Python data-pipeline Projects
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Fullstack Open Source Projects That Will Help You Become AI Devs (Python, JavaScript, AI) | dev.to | 2025-05-27
Give Pathway a try: https://github.com/pathwaycom/pathway 🌟 Pathway on GitHub
-
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Here, we use the free Mage Ai orchestration tool.
-
preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
Project mention: Revolutionizing Data Apps: Build Interactive Dashboards with Just Python! | dev.to | 2025-03-19View the Project on GitHub
-
Project mention: DocETL – open-source framework for complex document processing pipelines | news.ycombinator.com | 2024-10-21
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
pyper – Concurrent Python made simple
-
-
dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
-
-
take a look at https://github.com/conductor-sdk/conductor-python which is easier and will not force you to write with specific framework.
-
Doctor Droid (https://drdroid.io) | ML Engineer (15-20LPA INR) | Bangalore, INDIA | ONSITE | Full-time
We are a DevTool. We help engineers debug production issues faster. We do this thought our Open Source automation tooling for on-call and SRE engineers. We work with tech enterprises to help optimise their DevEx / On-Call routines.
Requirement:
-
dagster-odp
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Project mention: Declarative Data Pipelines: Moving from Code to Configuration | dev.to | 2025-02-04To demonstrate how dagster-odp brings these concepts together, we'll implement the same S3 to BigQuery pipeline we discussed earlier, but using a declarative approach. The complete implementation consists of three main components: resource configuration, task definition, and workflow configuration.
-
SmartPipeline
A framework for rapid development of robust data pipelines following a simple design pattern
-
analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Project mention: Show HN: OpenTimes – Free travel times between U.S. Census geographies | news.ycombinator.com | 2025-03-17Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.
Thanks!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python data-pipelines discussion
Python data-pipelines related posts
-
Dagster
-
Personal Picks: Data Product News (March 19, 2025)
-
Wk 3 Orchestration: MLOPs with DataTalks
-
Show HN: Pyper – Concurrent Python Made Simple
-
Monolith to Microservices: Should I Migrate and How?
-
AI Strategy Guide: How to Scale AI Across Your Business
-
Experience with Dagster.io?
-
A note from our sponsor - SaaSHub
www.saashub.com | 24 Jun 2025
Index
What are some of the best open-source data-pipeline projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Airflow | 40,629 |
2 | pathway | 27,601 |
3 | dagster | 13,411 |
4 | Mage | 8,382 |
5 | preswald | 4,149 |
6 | docetl | 2,292 |
7 | meltano | 2,114 |
8 | pyper | 1,433 |
9 | versatile-data-kit | 450 |
10 | dbt-data-reliability | 444 |
11 | recap | 343 |
12 | patterns-devkit | 108 |
13 | conductor-python | 75 |
14 | kenobi | 59 |
15 | dagster-odp | 33 |
16 | SmartPipeline | 27 |
17 | analytics_data_where_house | 9 |