The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 12 Python etl-pipeline Projects
-
Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
prism
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
dados-censup
Automação da ingestão de dados disponibilizados pelo INEP referente ao censo superior da educacão brasileira.
-
pipeline-docs-data-extractor
ETL-Texts aims to be a simple and efficient pipeline designed for extracting, translating, cleaning, and transforming text files.
-
workshop-realtime-data-pipelines
You will inspect and run a sample architecture making use of Apache Pulsar™ and Pulsar Functions for real-time, event-streaming-based data ingestion, cleaning and processing.
-
ticker_selection_BI_dashboard
Data Engineering Project: 4 shares of a stock data extraction, upload on MySql used to be in a BI project
Fast changing libraries are a huge pain. That's why a no-code approach like Unstract (https://github.com/zipstack/unstract) makes sense.
Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21
I have a sample code here that pulls data from an API and loads it into a DB, scheduled by cron, that can help with some ideas.
Hi everyone, this is my first DE project. Baitur5/reddit_api_elt (github.com) . It is basically about a data pipeline that extracts Reddit data for a Google Data Studio report, focusing on a specific subreddit Can you guys check it out , and give some advice & tips on how to improve it or the next things I should add.
Python etl-pipeline related posts
- Ask HN: Is RAG the Future of LLMs?
- Show HN: LLMWhisperer – Prep complex documents ready for use in LLMs
- RAGFlow is an open-source RAG engine based on deep document understanding
- Running OCR against PDFs and images directly in the browser
- Prism: the easiest way to create robust data workflows. Accessible via CLI
- Show HN: Prism – a framework for creating robust data science workflows
- Show HN: Prism – Data Orchestration in Python
-
A note from our sponsor - WorkOS
workos.com | 29 Apr 2024
Index
What are some of the best open-source etl-pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | pyspark-example-project | 1,370 |
2 | Udacity-Data-Engineering-Projects | 1,295 |
3 | patterns-devkit | 106 |
4 | unstract | 106 |
5 | prism | 79 |
6 | bitcoinMonitor | 53 |
7 | Spooq | 8 |
8 | dados-censup | 6 |
9 | pipeline-docs-data-extractor | 5 |
10 | workshop-realtime-data-pipelines | 3 |
11 | ticker_selection_BI_dashboard | 2 |
12 | reddit_api_elt | 2 |
Sponsored