SaaSHub helps you find the best software and product alternatives Learn more →
Top 19 Python etl-pipeline Projects
-
When Mark Adams and I (Daniel Davis) began working on what has become TrustGraph over 2 years ago, we knew that graph structures would be instrumental in realizing the potential of AI technology, specifically LLMs.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
OpenContracts
The open document intelligence platform for builders and hackers - DMS for the agentic world
-
FlashLearn
Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
-
Project mention: Show HN: streamable – sync/async iterable streams for Python | news.ycombinator.com | 2026-03-01
-
Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.
GitHub
-
VectorETL
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
-
-
Project mention: Durable queues, streams, pub/sub, and a cron scheduler – inside your SQLite file | news.ycombinator.com | 2026-04-30
-
prism
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)
-
-
-
Project mention: Show HN: DataCompose – PyJanitor-style dataframe cleaning for PySpark | news.ycombinator.com | 2025-08-28
-
insert-tools
CLI tool for inserting SELECT query results into ClickHouse with automatic schema matching and type-safe casting. Ideal for ETL pipelines and SQL-driven data flows.
-
In this tutorial, you'll learn how to build a complete data pipeline using Dotflow — a lightweight Python library that requires zero infrastructure.
-
ticker_selection_BI_dashboard
Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project
-
You can access the repo here: https://github.com/meemeealm/Multithreaded-Ingestion-Pipeline.git
Python etl-pipeline discussion
Python etl-pipeline related posts
-
Unstract: Open-source platform to ship document extraction APIs in minutes
-
Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes
-
Unstract: Open-source platform to ship document extraction APIs in minutes
-
Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes
-
Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes
-
OpenDataLoader-PDF: An open source tool for structured PDF parsing
-
Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes
-
A note from our sponsor - SaaSHub
www.saashub.com | 18 Jun 2026
Index
What are some of the best open-source etl-pipeline projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | trustgraph | 2,159 |
| 2 | pyspark-example-project | 2,087 |
| 3 | Udacity-Data-Engineering-Projects | 1,907 |
| 4 | OpenContracts | 1,357 |
| 5 | FlashLearn | 607 |
| 6 | streamable | 319 |
| 7 | Flowfile | 313 |
| 8 | VectorETL | 108 |
| 9 | patterns-devkit | 107 |
| 10 | python-sdk | 97 |
| 11 | prism | 88 |
| 12 | onetl | 87 |
| 13 | bitcoinMonitor | 75 |
| 14 | datacompose | 14 |
| 15 | Spooq | 10 |
| 16 | insert-tools | 8 |
| 17 | dotflow | 7 |
| 18 | ticker_selection_BI_dashboard | 4 |
| 19 | Multithreaded-Ingestion-Pipeline | 0 |