dbt-utils
learn-era-by-example
dbt-utils | learn-era-by-example | |
---|---|---|
7 | 1 | |
1,213 | 6 | |
2.9% | - | |
6.2 | 8.4 | |
10 days ago | about 2 months ago | |
Python | TypeScript | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
dbt-utils
-
Show HN: Nasty, a cross warehouse, type checked, unit testable analytics library
// To get around this, we can use the approach outlined by how dbt does ansi sql generate_series
// https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/generate_series.sql
-
Anything one should know before going for self-hosted dbt?
I got bit by dbt-utils/deduplicate naively removing any row that contained a null in it recently, but fortunately there was a workaround for Databricks and a few other flavors of SQL.
-
Managing SQL Tests
I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
-
Dbt to acquire Transform to build out its semantic layer
My top three:
- Dev/stag/prod env check numbers before pushing to production.
- Unions between two sources that are not the same shape can be done without the headache. https://github.com/dbt-labs/dbt-utils#union_relations-source
- Macros for common case when statements.
-
Analytics Stacks for Startups
Add tests: unit tests in SQL are still not really practical, but testing the data, before allowing users to see it, is possible. dbt has some basic tests like Non-NULL and so on. dbt_utils supports comparing data across tables. If you need more, there is Great Expectation and similar tools. dbt also supports writing SQL queries which output “bad” rows. Use this to, e.g. check a specific order against manually checked correct data. Tests give you confidence that your pipelines produce correct results: nothing is worse than waking up with a Slack message from your boss that the graphs look wrong… They are especially useful in case you have to refactor a data pipeline. Basically every query you would run during the QA phase of a change request has a high potential to become an automatic test.
- Why is Data Build Tool (DBT) is so popular? What are some other alternatives?
-
Unit testing SQL in DBT
The equality test macro is also in the dbt-utils package from fishtown at https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/schema_tests/equality.sql
learn-era-by-example
-
Show HN: Nasty, a cross warehouse, type checked, unit testable analytics library
Hi, I'm Grant, I'm one of the primary NASTY maintainers along with Tom and TJ.
One thing that's on the getnasty.dev but worth calling out explicitly with the link are the ruby koans style TDD learn by examples we have for NASTY. They should just require NodeJS in whatever form is easiest.
https://github.com/coterahq/learn-nasty-by-example
Another small but interesting thing that I think people who work in SQL would find cool are `invariants`, which do runtime data integrity checks
https://getnasty.dev/docs/invariants
What are some alternatives?
dbt-expectations - Port(ish) of Great Expectations to dbt test macros
sqlfluff - A modular SQL linter and auto-formatter with support for multiple dialects and templated code.
dbt-oracle - A dbt adapter for oracle db backend
nodejs-bigquery - Node.js client for Google Cloud BigQuery: A fast, economical and fully-managed enterprise data warehouse for large-scale data analytics.
streamlit - Streamlit — A faster way to build and share data apps.