The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Python dbt Projects
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
soda-core
:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
-
streamify
A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
astronomer-cosmos
Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
-
dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
recs-at-resonable-scale
Recommendations at "Reasonable Scale": joining dataOps with recSys through dbt, Merlin and Metaflow
-
dbt-ml-preprocessing
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
-
valmi-activation
⚡ valmi.io reverse ETL (data activation) is the open source ( OSS ) data activation platform to load data from warehouses into Webhooks and SaaS tools like Klaviyo, Facebook Ads, Salesforce, Braze etc. Valmi.io Customer Data Platform (CDP) helps track and ingest user activity events from websites, shopify, serverside events. https://cloud.valmi.io
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: A mage on the Hero’s Journey: a fantasy epic on how a startup rose from the ashes | dev.to | 2023-06-12In the coming years, Mage will create a cooperative experience so that developers can build data pipelines with their team and level up together. After that journey, Mage will go on an epic quest to create the 1st open world community experience in the data universe.
If the issue happen a lot, there is also: https://github.com/datafold/data-diff
That is a nice tool to do it cross database as well.
I think it's based on checksum method.
Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.
Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
Project mention: Run dbt projects as Apache Airflow DAGs and Task Groups with a few lines of code | news.ycombinator.com | 2023-05-01
Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE
Good paper, and in response to that one a team from Coveo, wrote this paper on behavioral tests for recommender systems... and also this repo.
Project mention: Is there something wrong with me, I hate dbt, what am I missing ? | /r/dataengineering | 2023-05-15This just feels like you aren’t using the plentiful tools to make those “mind-numbingly slow” dev steps faster. For ex., using dbt-coves to generate the staging models with casting to types in a couple clicks. And pulling directly from Fivetran tables is just poor practice, with the additional steps needed to do it “right” being inconsequential at best.
End-to-end stuff, full-fledge stacks: https://github.com/jacopotagliabue/post-modern-stack
Project mention: Curious if anyone has adopted a stack to do raw data ingestion in Databricks? | /r/dataengineering | 2023-04-25Our current data infra looks a little something like this: 1. Airbyte deployed on EKS for supported data connectors. I’m using the alpha Databricks connector to load directly into Unity Catalog. 1a. S3 bucket for raw landing zone storage if we cannot directly load into Databricks Managed Tables. 2. Orchestration, storage, and transformations are in Databricks. Calling out to the Airbyte api in the EKS cluster to keep all orchestrations inside Databricks. 2a. databricks-dbt for transformations & cleaning.
Project mention: Anyone have a good way of developing an ERD based off a dbt project; programmatically/automated? | /r/dataengineering | 2023-05-17
Project mention: Show HN: Valmi.io Open Source Reverse-ETL Engine | news.ycombinator.com | 2023-06-21
Python dbt related posts
- Launch HN: Grai (YC S22) – Open-Source Data Observability Platform
- When writing ML software - how do you use TDD?
- [Advice] MLOps Course recommendations
- Run dbt projects as Apache Airflow DAGs and Task Groups with a few lines of code
- Curious if anyone has adopted a stack to do raw data ingestion in Databricks?
- Running dbt core on airflow
- dolly-v2-12b
-
A note from our sponsor - WorkOS
workos.com | 19 Apr 2024
Index
What are some of the best open-source dbt projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Mage | 6,953 |
2 | data-diff | 2,830 |
3 | soda-core | 1,745 |
4 | sqlmesh | 1,231 |
5 | dbt-duckdb | 719 |
6 | streamify | 474 |
7 | piperider | 466 |
8 | astronomer-cosmos | 442 |
9 | dbt-metabase | 422 |
10 | airflow-dbt | 378 |
11 | dbt-data-reliability | 338 |
12 | grai-core | 267 |
13 | recs-at-resonable-scale | 218 |
14 | dbt-clickhouse | 208 |
15 | dbt-coves | 204 |
16 | dbt-athena | 182 |
17 | post-modern-stack | 180 |
18 | dbt-databricks | 179 |
19 | dbt-ml-preprocessing | 175 |
20 | dbt2looker | 171 |
21 | dbt-coverage | 167 |
22 | dbterd | 162 |
23 | valmi-activation | 126 |