Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues. Learn more →
Top 18 Python elt Projects
-
Hi HN,
We've built an SDK for building DAGs / data pipelines with LLMs in Apache Airflow [1] using Pydantic AI [2] under the hood. I've seen success across the board with Airflow users building simple LLM workflows before moving on to "AI agents". In my experience, the noise around building agents means that people forget that there are other ways to get more immediate value out of LLMs.
Coupling Airflow for orchestration and Pydantic AI for LLM interactions has turned out to be a very pragmatic approach to building these workflows (and agents). Neither tool "gets in the way" of what you're trying to do. Airflow's been around for 10+ years and has a very well-built orchestration engine rich with everything you need to write production grade data pipelines, and Pydantic AI's been a refreshing take on working with LLMs.
Would love some feedback from this community!
[1] https://github.com/apache/airflow
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Project mention: I've been using Claude Code for a couple of days | news.ycombinator.com | 2025-03-09it's fun for things you're ok with throwing away.
For example, I wanted a dbt[0] like tool, but written in rust, specifically focused on duckdb. Claude Code knocked it out[1] it without much guidance.
Also added support for all duckdb output options (e.g. write to a partitioned parquet instead of a table).
0 - SQL transformation tool (https://github.com/dbt-labs/dbt-core)
1 - https://github.com/definite-app/crabwalk
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
Here, we use the free Mage Ai orchestration tool.
-
-
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
Project mention: DocETL – open-source framework for complex document processing pipelines | news.ycombinator.com | 2024-10-21
-
-
-
astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
-
-
reddit-detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
-
sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
-
Meltano Singer SDK
Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com (by meltano)
-
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
-
dagster-odp
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Project mention: Declarative Data Pipelines: Moving from Code to Configuration | dev.to | 2025-02-04To demonstrate how dagster-odp brings these concepts together, we'll implement the same S3 to BigQuery pipeline we discussed earlier, but using a declarative approach. The complete implementation consists of three main components: resource configuration, task definition, and workflow configuration.
-
analytics_data_where_house
An analytics engineering sandbox focusing on real estates prices in Cook County, IL
Project mention: Show HN: OpenTimes – Free travel times between U.S. Census geographies | news.ycombinator.com | 2025-03-17Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.
Thanks!
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
Python elt discussion
Python elt related posts
-
Show HN: Meltano Cloud (Gitlab spinout) – Managed infra for open source ELT
-
DBT lays off 15% of their staff
-
SQL Mesh - Auto DAG generation!!
-
SQL Mesh - Auto DAG generation!!
-
Data transformation tools other than DBT
-
Semantic Understanding of SQL
-
Virtual Data Environments
-
A note from our sponsor - Judoscale
judoscale.com | 25 Apr 2025
Index
What are some of the best open-source elt projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Airflow | 39,794 |
2 | airbyte | 17,903 |
3 | dbt-core | 10,692 |
4 | Mage | 8,264 |
5 | dlt | 3,516 |
6 | sqlmesh | 2,256 |
7 | meltano | 2,034 |
8 | docetl | 1,767 |
9 | dbt-metabase | 510 |
10 | versatile-data-kit | 446 |
11 | astro-sdk | 369 |
12 | dbt-coves | 261 |
13 | reddit-detective | 213 |
14 | sayn | 122 |
15 | Meltano Singer SDK | 106 |
16 | dbd | 57 |
17 | dagster-odp | 31 |
18 | analytics_data_where_house | 9 |