Python elt

Open-source Python projects categorized as elt

Top 18 Python elt Projects

  1. Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: Airflow AI SDK to build simple LLM workflows | news.ycombinator.com | 2025-03-26

    Hi HN,

    We've built an SDK for building DAGs / data pipelines with LLMs in Apache Airflow [1] using Pydantic AI [2] under the hood. I've seen success across the board with Airflow users building simple LLM workflows before moving on to "AI agents". In my experience, the noise around building agents means that people forget that there are other ways to get more immediate value out of LLMs.

    Coupling Airflow for orchestration and Pydantic AI for LLM interactions has turned out to be a very pragmatic approach to building these workflows (and agents). Neither tool "gets in the way" of what you're trying to do. Airflow's been around for 10+ years and has a very well-built orchestration engine rich with everything you need to write production grade data pipelines, and Pydantic AI's been a refreshing take on working with LLMs.

    Would love some feedback from this community!

    [1] https://github.com/apache/airflow

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: Personal Picks: Data Product News (April 16, 2025) | dev.to | 2025-04-15
  4. dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

    Project mention: I've been using Claude Code for a couple of days | news.ycombinator.com | 2025-03-09

    it's fun for things you're ok with throwing away.

    For example, I wanted a dbt[0] like tool, but written in rust, specifically focused on duckdb. Claude Code knocked it out[1] it without much guidance.

    Also added support for all duckdb output options (e.g. write to a partitioned parquet instead of a table).

    0 - SQL transformation tool (https://github.com/dbt-labs/dbt-core)

    1 - https://github.com/definite-app/crabwalk

  5. Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: Wk 3 Orchestration: MLOPs with DataTalks | dev.to | 2025-02-22

    Here, we use the free Mage Ai orchestration tool.

  6. dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Project mention: Data Loading Tool | news.ycombinator.com | 2024-12-14
  7. sqlmesh

    Scalable and efficient data transformation framework - backwards compatible with dbt.

  8. meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. docetl

    A system for agentic LLM-powered data processing and ETL

    Project mention: DocETL – open-source framework for complex document processing pipelines | news.ycombinator.com | 2024-10-21
  11. dbt-metabase

    dbt + Metabase integration

  12. versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  13. astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

  14. dbt-coves

    CLI tool for dbt users to simplify creation of staging models (yml and sql) files

  15. reddit-detective

    Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

  16. sayn

    Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

  17. Meltano Singer SDK

    Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com (by meltano)

  18. dbd

    dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

  19. dagster-odp

    A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

    Project mention: Declarative Data Pipelines: Moving from Code to Configuration | dev.to | 2025-02-04

    To demonstrate how dagster-odp brings these concepts together, we'll implement the same S3 to BigQuery pipeline we discussed earlier, but using a declarative approach. The complete implementation consists of three main components: resource configuration, task definition, and workflow configuration.

  20. analytics_data_where_house

    An analytics engineering sandbox focusing on real estates prices in Cook County, IL

    Project mention: Show HN: OpenTimes – Free travel times between U.S. Census geographies | news.ycombinator.com | 2025-03-17

    Thank you for this excellent post! I've been developing [my own platform](https://github.com/MattTriano/analytics_data_where_house) that curates a data warehouse mostly of census and socrata datasets but I haven't really had a good way to share the products with anyone as it's a bit too heavyweight. I've been trying to find alternate solutions to that issue (I'm currently building out a much smaller [platform](https://github.com/MattTriano/fbi_cde_data) to process the FBI's NIBRS datasets), and your post has given me a few great implementations to study and experiment with.

    Thanks!

  21. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python elt discussion

Log in or Post with

Python elt related posts

Index

What are some of the best open-source elt projects in Python? This list will help you:

# Project Stars
1 Airflow 39,794
2 airbyte 17,903
3 dbt-core 10,692
4 Mage 8,264
5 dlt 3,516
6 sqlmesh 2,256
7 meltano 2,034
8 docetl 1,767
9 dbt-metabase 510
10 versatile-data-kit 446
11 astro-sdk 369
12 dbt-coves 261
13 reddit-detective 213
14 sayn 122
15 Meltano Singer SDK 106
16 dbd 57
17 dagster-odp 31
18 analytics_data_where_house 9

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com