Python etl-pipeline

Open-source Python projects categorized as etl-pipeline

Top 19 Python etl-pipeline Projects

etl-pipeline
  1. trustgraph

    The semantic deployment platform.

    Project mention: The Context Graph Manifesto | dev.to | 2025-12-31

    When Mark Adams and I (Daniel Davis) began working on what has become TrustGraph over 2 years ago, we knew that graph structures would be instrumental in realizing the potential of AI technology, specifically LLMs.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. pyspark-example-project

    Implementing best practices for PySpark ETL jobs and applications.

  4. Udacity-Data-Engineering-Projects

    Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

  5. OpenContracts

    The open document intelligence platform for builders and hackers - DMS for the agentic world

  6. FlashLearn

    Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.

  7. streamable

    sync/async iterable streams for Python

    Project mention: Show HN: streamable – sync/async iterable streams for Python | news.ycombinator.com | 2026-03-01
  8. Flowfile

    Flowfile is a visual ETL tool and Python library combining drag-and-drop workflows with Polars dataframes. Build data pipelines visually, define flows programmatically with a Polars-like API, and export to standalone Python code. Perfect for fast, intuitive data processing from development to production.

    Project mention: Flowfile v0.8.0 — Your Flows Can Run Themselves Now | dev.to | 2026-03-26

    GitHub

  9. VectorETL

    Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications

  10. patterns-devkit

    Data pipelines from re-usable components

  11. python-sdk

    Conductor OSS SDK for Python programming language (by conductor-oss)

    Project mention: Durable queues, streams, pub/sub, and a cron scheduler – inside your SQLite file | news.ycombinator.com | 2026-04-30
  12. prism

    Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

  13. onetl

    One ETL tool to rule them all

  14. bitcoinMonitor

    Near real time ETL to populate a dashboard.

  15. datacompose

    Data Cleaning for Pyspark

    Project mention: Show HN: DataCompose – PyJanitor-style dataframe cleaning for PySpark | news.ycombinator.com | 2025-08-28
  16. Spooq

  17. insert-tools

    CLI tool for inserting SELECT query results into ClickHouse with automatic schema matching and type-safe casting. Ideal for ETL pipelines and SQL-driven data flows.

  18. dotflow

    🎲 Dotflow turns an idea into flow! — Lightweight Python library for execution pipelines

    Project mention: How to Create a Pipeline with Dotflow in Python | dev.to | 2026-04-06

    In this tutorial, you'll learn how to build a complete data pipeline using Dotflow — a lightweight Python library that requires zero infrastructure.

  19. ticker_selection_BI_dashboard

    Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project

  20. Multithreaded-Ingestion-Pipeline

    Mini ETL 🔧 ingestion pipeline which works in bronze layer

    Project mention: Simple Queue Can Save Your Pipeline: DuckDB + Python | dev.to | 2026-04-08

    You can access the repo here: https://github.com/meemeealm/Multithreaded-Ingestion-Pipeline.git

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python etl-pipeline discussion

Log in or Post with

Python etl-pipeline related posts

  • Unstract: Open-source platform to ship document extraction APIs in minutes

    1 project | news.ycombinator.com | 9 Mar 2026
  • Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes

    1 project | news.ycombinator.com | 29 Dec 2025
  • Unstract: Open-source platform to ship document extraction APIs in minutes

    1 project | news.ycombinator.com | 16 Dec 2025
  • Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes

    1 project | news.ycombinator.com | 4 Nov 2025
  • Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes

    1 project | news.ycombinator.com | 30 Sep 2025
  • OpenDataLoader-PDF: An open source tool for structured PDF parsing

    2 projects | news.ycombinator.com | 23 Sep 2025
  • Unstract: Open-source platform to ship document extraction APIs/MCPs in minutes

    1 project | news.ycombinator.com | 16 Sep 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 18 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source etl-pipeline projects in Python? This list will help you:

# Project Stars
1 trustgraph 2,159
2 pyspark-example-project 2,087
3 Udacity-Data-Engineering-Projects 1,907
4 OpenContracts 1,357
5 FlashLearn 607
6 streamable 319
7 Flowfile 313
8 VectorETL 108
9 patterns-devkit 107
10 python-sdk 97
11 prism 88
12 onetl 87
13 bitcoinMonitor 75
14 datacompose 14
15 Spooq 10
16 insert-tools 8
17 dotflow 7
18 ticker_selection_BI_dashboard 4
19 Multithreaded-Ingestion-Pipeline 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com