Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 16 Python elt Projects
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
-
reddit-detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
-
sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
-
Meltano Singer SDK
Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com (by meltano)
-
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.
It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.
Project mention: A mage on the Hero’s Journey: a fantasy epic on how a startup rose from the ashes | dev.to | 2023-06-12In the coming years, Mage will create a cooperative experience so that developers can build data pipelines with their team and level up together. After that journey, Mage will go on an epic quest to create the 1st open world community experience in the data universe.
Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03SEEKING FREELANCER | REMOTE | GERMANY
dltHub is looking for a freelance help in the following repos:
Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.
Here's the project: https://github.com/vmware/versatile-data-kit
Project mention: Orchestration: Thoughts on Dagster, Airflow and Prefect? | /r/dataengineering | 2023-06-01Have you tried the Astro SDK? https://github.com/astronomer/astro-sdk
Project mention: Is there something wrong with me, I hate dbt, what am I missing ? | /r/dataengineering | 2023-05-15This just feels like you aren’t using the plentiful tools to make those “mind-numbingly slow” dev steps faster. For ex., using dbt-coves to generate the staging models with casting to types in a couple clicks. And pulling directly from Fivetran tables is just poor practice, with the additional steps needed to do it “right” being inconsequential at best.
Hi everyone, this is my first DE project. Baitur5/reddit_api_elt (github.com) . It is basically about a data pipeline that extracts Reddit data for a Google Data Studio report, focusing on a specific subreddit Can you guys check it out , and give some advice & tips on how to improve it or the next things I should add.
Python elt related posts
- Show HN: Meltano Cloud (Gitlab spinout) – Managed infra for open source ELT
- DBT lays off 15% of their staff
- SQL Mesh - Auto DAG generation!!
- SQL Mesh - Auto DAG generation!!
- Data transformation tools other than DBT
- Semantic Understanding of SQL
- Virtual Data Environments
-
A note from our sponsor - InfluxDB
www.influxdata.com | 19 Apr 2024
Index
What are some of the best open-source elt projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Airflow | 34,397 |
2 | airbyte | 13,821 |
3 | dbt-core | 8,842 |
4 | Mage | 6,953 |
5 | dlt | 1,694 |
6 | meltano | 1,577 |
7 | sqlmesh | 1,231 |
8 | dbt-metabase | 422 |
9 | versatile-data-kit | 409 |
10 | astro-sdk | 315 |
11 | reddit-detective | 206 |
12 | dbt-coves | 204 |
13 | sayn | 117 |
14 | Meltano Singer SDK | 83 |
15 | dbd | 55 |
16 | reddit_api_elt | 2 |