Python dbt

Open-source Python projects categorized as dbt

Top 23 Python dbt Projects

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: A mage on the Hero’s Journey: a fantasy epic on how a startup rose from the ashes | dev.to | 2023-06-12

    In the coming years, Mage will create a cooperative experience so that developers can build data pipelines with their team and level up together. After that journey, Mage will go on an epic quest to create the 1st open world community experience in the data universe.

  • data-diff

    Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

  • Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • dbt-duckdb

    dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)

  • streamify

    A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

  • piperider

    Code review for data in dbt

  • Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • astronomer-cosmos

    Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code

  • Project mention: Run dbt projects as Apache Airflow DAGs and Task Groups with a few lines of code | news.ycombinator.com | 2023-05-01
  • dbt-metabase

    dbt + Metabase integration

  • airflow-dbt

    Apache Airflow integration for dbt

  • dbt-data-reliability

    dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • grai-core

  • Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17

    Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE

  • recs-at-resonable-scale

    Recommendations at "Reasonable Scale": joining dataOps with recSys through dbt, Merlin and Metaflow

  • Project mention: When writing ML software - how do you use TDD? | /r/mlops | 2023-06-25

    Good paper, and in response to that one a team from Coveo, wrote this paper on behavioral tests for recommender systems... and also this repo.

  • dbt-clickhouse

    The Clickhouse plugin for dbt (data build tool)

  • dbt-coves

    CLI tool for dbt users to simplify creation of staging models (yml and sql) files

  • Project mention: Is there something wrong with me, I hate dbt, what am I missing ? | /r/dataengineering | 2023-05-15

    This just feels like you aren’t using the plentiful tools to make those “mind-numbingly slow” dev steps faster. For ex., using dbt-coves to generate the staging models with casting to types in a couple clicks. And pulling directly from Fivetran tables is just poor practice, with the additional steps needed to do it “right” being inconsequential at best.

  • dbt-athena

    The athena adapter plugin for dbt (https://getdbt.com) (by dbt-athena)

  • post-modern-stack

    Joining the modern data stack with the modern ML stack

  • Project mention: [Advice] MLOps Course recommendations | /r/datascience | 2023-06-24

    End-to-end stuff, full-fledge stacks: https://github.com/jacopotagliabue/post-modern-stack

  • dbt-databricks

    A dbt adapter for Databricks.

  • Project mention: Curious if anyone has adopted a stack to do raw data ingestion in Databricks? | /r/dataengineering | 2023-04-25

    Our current data infra looks a little something like this: 1. Airbyte deployed on EKS for supported data connectors. I’m using the alpha Databricks connector to load directly into Unity Catalog. 1a. S3 bucket for raw landing zone storage if we cannot directly load into Databricks Managed Tables. 2. Orchestration, storage, and transformations are in Databricks. Calling out to the Airbyte api in the EKS cluster to keep all orchestrations inside Databricks. 2a. databricks-dbt for transformations & cleaning.

  • dbt-ml-preprocessing

    A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

  • dbt2looker

    Generate lookml for views from dbt models

  • dbt-coverage

    One-stop-shop for docs and test coverage of dbt projects.

  • dbterd

    Generate the ERD as a code from dbt artifacts

  • Project mention: Anyone have a good way of developing an ERD based off a dbt project; programmatically/automated? | /r/dataengineering | 2023-05-17
  • valmi-activation

    ⚡ valmi.io reverse ETL (data activation) is the open source ( OSS ) data activation platform to load data from warehouses into Webhooks and SaaS tools like Klaviyo, Facebook Ads, Salesforce, Braze etc. Valmi.io Customer Data Platform (CDP) helps track and ingest user activity events from websites, shopify, serverside events. https://cloud.valmi.io

  • Project mention: Show HN: Valmi.io Open Source Reverse-ETL Engine | news.ycombinator.com | 2023-06-21
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-09-06.

Python dbt related posts

Index

What are some of the best open-source dbt projects in Python? This list will help you:

Project Stars
1 Mage 6,953
2 data-diff 2,830
3 soda-core 1,745
4 sqlmesh 1,231
5 dbt-duckdb 719
6 streamify 474
7 piperider 466
8 astronomer-cosmos 442
9 dbt-metabase 422
10 airflow-dbt 378
11 dbt-data-reliability 338
12 grai-core 267
13 recs-at-resonable-scale 218
14 dbt-clickhouse 208
15 dbt-coves 204
16 dbt-athena 182
17 post-modern-stack 180
18 dbt-databricks 179
19 dbt-ml-preprocessing 175
20 dbt2looker 171
21 dbt-coverage 167
22 dbterd 162
23 valmi-activation 126
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com