daggy vs dbt-expectations

daggy

By iroddis

Suggest topics

Source Code

Suggest alternative

Edit details

dbt-expectations

Port(ish) of Great Expectations to dbt test macros (by calogica)

dbt

Source Code

calogica.github.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

daggy		dbt-expectations
	Project
2	Mentions	10
-	Stars	947
-	Growth	2.4%
-	Activity	6.6
-	Latest Commit	8 days ago
	Language	Shell
-	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

daggy

Posts with mentions or reviews of daggy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-08.

ETL Pipelines with Airflow: The Good, the Bad and the Ugly
7 projects | news.ycombinator.com | 8 Oct 2021

Thanks for the feedback. I'll take a look at how Luigi models task state. Right now each TaskExecutor type is responsible for running and reporting on tasks (e.g. the Slurm executor submits jobs and monitors them for completion). I was considering adding a companion "verify" stage for every vertex, which would be a command that ran and verified output. It might be a way to do what I think you're describing above without having to build in a variety of expected outputs into the daggy core. I'll check what Luigi is doing, though.
> resuming a partially failed build
Daggy does this! Right now it will continue running the DAG until every path is completed or all vertices in a processing state (queued, running, retry, error) are in the error state, then the DAG goes to an error state.
It's possible to explicitly set task/vertex states (e.g. mark it complete if the step was manually completed), then change the DAG state to QUEUED, at which point the DAG will resume execution from where it left off. [1] is a unit test that walks through that functionality.
[1] https://gitlab.com/iroddis/daggy/-/blob/master/tests/unit_se...

dbt-expectations

Posts with mentions or reviews of dbt-expectations. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-26.

Dbt tests vs Soda SQL
1 project | /r/dataengineering | 26 May 2023

Have not used Soda, but dbt indeed is pretty good especially when adding dbt-expectations
Data-eng related highlights from the latest Thoughtworks Tech Radar
3 projects | /r/dataengineering | 26 Apr 2023

dbt-expectations
Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
1 project | /r/dataengineering | 30 Mar 2023

I highly.. highly.. recommend the dbt-expectations extension from Catologica for dbt. It's a port of Great Expectations, except you can quickly thunk it in your schema.yml's and have it run as part of your dbt test process. Super powerful and it's prevented us from shipping bad data many times.
Managing SQL Tests
2 projects | /r/dataengineering | 30 Mar 2023

I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
What are some Data Quality check related frameworks for datasets ranging from 100GB to 1TB in size?
1 project | /r/dataengineering | 30 Dec 2022

Use dbt's testing functionality during your transformations with catalogica/dbt-expectations (Great Expectations framework ported to dbt)
Great Expectations is annoyingly cumbersome
3 projects | /r/dataengineering | 30 Nov 2022

Check out dbt-expectations https://github.com/calogica/dbt-expectations
CI/CD in data engineering - help a noob
2 projects | /r/dataengineering | 3 Dec 2021

There are certain things I would like to add such as data quality, I can use something like dbt great expectations, but I am not sure how much more I should force it before getting an airflow setup..
How do you query and quality check data produced in intermediate steps in analytics pipeline?
1 project | /r/dataengineering | 13 Oct 2021
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
7 projects | news.ycombinator.com | 8 Oct 2021

[dbt Labs employee here]
Check out dbt-expectations package[1]. It's a port of the Great Expectations checks to dbt as tests. The advantage of this is you don't need another tool for these pretty standard tests, and can be early incorporated into dbt workflows.
[1] https://github.com/calogica/dbt-expectations
Unit testing SQL in DBT
3 projects | /r/dataengineering | 6 Feb 2021

Also check out dbt-expectations that is a port of Great Expectations that greatly expands the configurable (non-assert) tests.

What are some alternatives?

When comparing daggy and dbt-expectations you can also consider the following projects:

Scio - A Scala API for Apache Beam and Google Cloud Dataflow.

dbt-utils - Utility functions for dbt projects.

materialize - The data warehouse for operational workloads.

dbt-oracle - A dbt adapter for oracle db backend

NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

cuetils - CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml

dbt-fal - do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.

pandera - A light-weight, flexible, and expressive statistical data testing library

daggy vs Scio dbt-expectations vs dbt-utils daggy vs materialize dbt-expectations vs dbt-oracle daggy vs NVTabular dbt-expectations vs materialize daggy vs cuetils dbt-expectations vs Scio dbt-expectations vs NVTabular dbt-expectations vs cuetils dbt-expectations vs dbt-fal dbt-expectations vs pandera

Compare daggy vs dbt-expectations and see what are their differences.

daggy

dbt-expectations

daggy

dbt-expectations

What are some alternatives?