Top 12 Python etl-pipeline Projects

pyspark-example-project

1 1,370 0.0 Python

Implementing best practices for PySpark ETL jobs and applications.
Udacity-Data-Engineering-Projects

5 1,295 0.0 Python

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
patterns-devkit

5 106 2.9 Python

Data pipelines from re-usable components
unstract

5 106 9.5 Python

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

Project mention: Ask HN: Is RAG the Future of LLMs? | news.ycombinator.com | 2024-04-14

Fast changing libraries are a huge pain. That's why a no-code approach like Unstract (https://github.com/zipstack/unstract) makes sense.

prism

7 79 8.9 Python

Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21

bitcoinMonitor

1 53 3.3 Python

Near real time ETL to populate a dashboard.

Project mention: Best place to learn APIS? | /r/dataengineering | 2023-06-26

I have a sample code here that pulls data from an API and loads it into a DB, scheduled by cron, that can help with some ideas.

Spooq

1 8 7.4 Python
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
dados-censup

1 6 4.9 Python

Automação da ingestão de dados disponibilizados pelo INEP referente ao censo superior da educacão brasileira.
pipeline-docs-data-extractor

1 5 7.8 Python

ETL-Texts aims to be a simple and efficient pipeline designed for extracting, translating, cleaning, and transforming text files.

Project mention: ETL Texts | news.ycombinator.com | 2024-01-14

workshop-realtime-data-pipelines

1 3 2.3 Python

You will inspect and run a sample architecture making use of Apache Pulsar™ and Pulsar Functions for real-time, event-streaming-based data ingestion, cleaning and processing.
ticker_selection_BI_dashboard

2 2 10.0 Python

Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project
reddit_api_elt

1 2 7.0 Python

Project mention: Reddit ELT Pipeline | /r/dataengineering | 2023-12-11

Hi everyone, this is my first DE project. Baitur5/reddit_api_elt (github.com) . It is basically about a data pipeline that extracts Reddit data for a Google Data Studio report, focusing on a specific subreddit Can you guys check it out , and give some advice & tips on how to improve it or the next things I should add.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python etl-pipeline related posts

Ask HN: Is RAG the Future of LLMs?
2 projects | news.ycombinator.com | 14 Apr 2024
Show HN: LLMWhisperer – Prep complex documents ready for use in LLMs
1 project | news.ycombinator.com | 10 Apr 2024
RAGFlow is an open-source RAG engine based on deep document understanding
8 projects | news.ycombinator.com | 1 Apr 2024
Running OCR against PDFs and images directly in the browser
7 projects | news.ycombinator.com | 30 Mar 2024
Prism: the easiest way to create robust data workflows. Accessible via CLI
1 project | /r/coolgithubprojects | 21 Sep 2023
Show HN: Prism – a framework for creating robust data science workflows
1 project | news.ycombinator.com | 1 Sep 2023
Show HN: Prism – Data Orchestration in Python
1 project | news.ycombinator.com | 28 Jul 2023
A note from our sponsor - WorkOS
workos.com | 29 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source etl-pipeline projects in Python? This list will help you:

	Project	Stars
1	pyspark-example-project	1,370
2	Udacity-Data-Engineering-Projects	1,295
3	patterns-devkit	106
4	unstract	106
5	prism	79
6	bitcoinMonitor	53
7	Spooq	8
8	dados-censup	6
9	pipeline-docs-data-extractor	5
10	workshop-realtime-data-pipelines	3
11	ticker_selection_BI_dashboard	2
12	reddit_api_elt	2