Open-source projects categorized as dbt | Edit details
Language filter: + Python + TypeScript + Kotlin

Top 10 dbt Open-Source Projects

  • GitHub repo soda-sql

    Data profiling, testing, and monitoring for SQL accessible data.

    Project mention: Being constantly shut down by more senior team members when I mention adding some QA in our work | reddit.com/r/dataengineering | 2022-01-10

    As many have said, there might be business side of things to deliver. Somebody above promised delivery with tight deadlines. Trust me, I am not a fan, but this how the world works and it sucks. I would say in your free time, explore tools like greatexpectations.io https://greatexpectations.io/ or https://github.com/sodadata/soda-sql which are modern ways of testing in your learning curve

  • GitHub repo lightdash

    An open source alternative to Looker built using dbt. Made for analysts ❤️

    Project mention: Launch HN: Metaplane (YC W20) – Datadog for Data | news.ycombinator.com | 2021-11-15

    1) An integration with Metabase Cloud is on our roadmap for Q1! We'd love to integrate with Lightdash, but they don't have a public API just yet[1].

    2) Several of our customers use us to alert on schema changes in Postgres, specifically so they can get ahead of application database changes that will end up in the warehouse, so you're definitely not alone! Here's a link on how to connect postgres: https://docs.metaplane.dev/docs/postgres

    That's an excellent stack and one we kept front and center when building out Metaplane, so definitely let us know if you have any feedback or suggestions here!

    [1]: https://github.com/lightdash/lightdash/issues/632

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo fal

    do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.

    Project mention: Do I need orchestration for a Fivetran-dbt stack? | reddit.com/r/dataengineering | 2021-12-05

    Yes I agree with you that having fivetran/airbyte and dbt covers a lot of the airflow use cases.. That being said you might still want to run some scripts after the DBT transformation is over, we ran into this exact problem and built a useful CLI tool for running python scripts alongside the dbt run.

  • GitHub repo metriql

    The metrics layer for your data. Join us at https://metriql.com/slack

    Project mention: Open source Business intelligence platform made with Python | news.ycombinator.com | 2021-11-28

    We're using Superset to enable our analysts to explore our clients' SEM/SEO/analytics data. It also posts alerts to Slack when, say, the daily session count of a website isn't what was expected given the historical data.

    Yeah, it's a little rough to get going, but once it is, we've found it to be a really powerful (and actively developed!) BI tool. It's even better with dbt + MetriQL [0], which can automatically sync Superset's dataset metadata directly with properties you set up in dbt.

    Adding custom visualizations is much harder than it should be, but they're very much aware of that, and working to address it. Their Slack community is super-helpful, too.

    [0]: https://metriql.com

  • GitHub repo dbt-spotify-analytics

    Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase

    Project mention: Has anyone taken Coursera Data Engineering Foundations Course? | reddit.com/r/dataengineering | 2021-06-03
  • GitHub repo dbt-coverage

    One-stop-shop for docs and test coverage of dbt projects.

    Project mention: dbt Coalesce 2021 takeaways | reddit.com/r/dataengineering | 2021-12-10

    Slido develop dbt-coverage to get a documentation coverage number you could put in your CI/CD to naively push back analysts merge requests :D

  • GitHub repo dbt-sessionization

    Using DBT for Creating Session Abstractions on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.

    Project mention: Send Form Data From Marketo to Multiple Destinations Using RudderStack | dev.to | 2022-01-13

    By using RudderStack to understand how users are finding and interacting with your site and then combining that with the data collected by your Marketo forms, you'll get deeper insights about your potential customers and provide higher quality leads to your sales team.

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • GitHub repo trino_data_mesh

    Proof of concept on how to gain insights with Trino across different databases from a distributed data mesh

    Project mention: What even is data mesh | news.ycombinator.com | 2021-07-29

    Not central to the main ideas of this article, but if you want to have a data mesh that is self-service, why force folks to use a particular storage medium like a data warehouse? That still requires centralization of the data.

    Why not instead have a tool like Trino (https://trino.io) that allows you to let different domains use whatever datastore they happen to use. You still would need to enforce schema, but this can be done in tools like schema registry as mentioned in the article along with a data cataloging tool.

    These tools facilitate the distributed nature of the problem nicely and encourage healthy standards to be discussed and the formalized in schema definitions and catalogs that remove the ambiguity of discourse and documentation.

    Nice example is laid out in this repo of how Trino can accomplish data mesh principles 1 and 3 (https://github.com/findinpath/trino_data_mesh).

  • GitHub repo dbtTestExamples

    Some examples of dbt schema tests and data tests inside a simple dbt model. This is for anyone interested in learning how to implement dbt tests and the limitations around them.

    Project mention: 2 Critical Fixes For Installing dbt 0.19.0 | dev.to | 2021-03-24

    I hope these quick facts about dbt installation are helpful for you. If you'd like to see a dbt project in action, please feel free to clone my dbtTestExamples repository on Github and learn how to connect a dbt model and tests to a Google Big Query instance.

  • GitHub repo dbt-customer-journey-analysis

    Using DBT for Customer Journey Analysis on RudderStack - an open-source, warehouse-first customer data pipeline and Segment alternative.

    Project mention: Customer Session Analysis Using dbt and RudderStack | dev.to | 2022-01-03

    dbt_project.yml - Every dbt project has a dbt_project.yml file.  These are written in YAML and define common conventions and properties.   For our project, the highlights from this page include the name, version and that we want our models to be materialized as views.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-13.

dbt related posts


What are some of the best open-source dbt projects? This list will help you:

Project Stars
1 soda-sql 673
2 lightdash 570
3 fal 171
4 metriql 121
5 dbt-spotify-analytics 61
6 dbt-coverage 48
7 dbt-sessionization 9
8 trino_data_mesh 6
9 dbtTestExamples 2
10 dbt-customer-journey-analysis 2
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.