hudi vs dbt-expectations

hudi

Upserts, Deletes And Incremental Processing on Big Data. (by apache)

Source Code

hudi.apache.org

Suggest alternative

Edit details

dbt-expectations

Port(ish) of Great Expectations to dbt test macros (by calogica)

dbt

Source Code

calogica.github.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

hudi		dbt-expectations
	Project
20	Mentions	10
5,085	Stars	947
1.4%	Growth	2.4%
9.9	Activity	6.6
2 days ago	Latest Commit	12 days ago
Java	Language	Shell
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

hudi

Posts with mentions or reviews of hudi. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-18.

Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog
4 projects | dev.to | 18 Dec 2023

Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
The "Big Three's" Data Storage Offerings
2 projects | /r/dataengineering | 15 Jun 2023

Structured, Semi-structured and Unstructured can be stored in one single format, a lakehouse storage format like Delta, Iceberg or Hudi (assuming those don't require low-latency SLAs like subsecond).
Data-eng related highlights from the latest Thoughtworks Tech Radar
3 projects | /r/dataengineering | 26 Apr 2023

Apache Hudi
For those of you with Lakehouse Architectures, how do you handle duplicate records?
1 project | /r/dataengineering | 16 Apr 2023
AWS ACID data lakehouse
1 project | /r/dataengineering | 30 Jan 2023

Try Apache Hudi, it is fully integrated with AWS and offers almost everything that you requested.
Data n00b looking for guidance on how to setup data lake/warehouse
1 project | /r/dataengineering | 29 Oct 2022

the corresponding kafka topics have 30d retention and I intend on having s3 sink connector for long term storage (open to other ideas here too, I noticed theres a hudi connector also)
apache/hudi: Upserts, Deletes And Incremental Processing on Big Data.
1 project | /r/devopsish | 20 Oct 2022
Big Data file formats
1 project | /r/apachespark | 13 Jun 2022
How-to-Guide: Contributing to Open Source
19 projects | /r/dataengineering | 11 Jun 2022

Apache Hudi
What do you use for Data versioning?
1 project | /r/mlops | 28 Mar 2022

You could have a look at Apache Hudi - especially if you're running your Data Pipelines using Spark or Flink.

dbt-expectations

Posts with mentions or reviews of dbt-expectations. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-26.

Dbt tests vs Soda SQL
1 project | /r/dataengineering | 26 May 2023

Have not used Soda, but dbt indeed is pretty good especially when adding dbt-expectations
Data-eng related highlights from the latest Thoughtworks Tech Radar
3 projects | /r/dataengineering | 26 Apr 2023

dbt-expectations
Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
1 project | /r/dataengineering | 30 Mar 2023

I highly.. highly.. recommend the dbt-expectations extension from Catologica for dbt. It's a port of Great Expectations, except you can quickly thunk it in your schema.yml's and have it run as part of your dbt test process. Super powerful and it's prevented us from shipping bad data many times.
Managing SQL Tests
2 projects | /r/dataengineering | 30 Mar 2023

I'm used to utilising dbt and defining my tests there (along with dbt-utils or https://github.com/calogica/dbt-expectations): I simply add a list item to a column definition and can already define a great number of tests without having to copy code. I can even extend the pre-defined using generic tests. Writing custom tests also integrates nicely. Additionally it's very convenient to tag tests or define a severity. The learning curve for a business engineer is almost flat as long as they know some SQL.
What are some Data Quality check related frameworks for datasets ranging from 100GB to 1TB in size?
1 project | /r/dataengineering | 30 Dec 2022

Use dbt's testing functionality during your transformations with catalogica/dbt-expectations (Great Expectations framework ported to dbt)
Great Expectations is annoyingly cumbersome
3 projects | /r/dataengineering | 30 Nov 2022

Check out dbt-expectations https://github.com/calogica/dbt-expectations
CI/CD in data engineering - help a noob
2 projects | /r/dataengineering | 3 Dec 2021

There are certain things I would like to add such as data quality, I can use something like dbt great expectations, but I am not sure how much more I should force it before getting an airflow setup..
How do you query and quality check data produced in intermediate steps in analytics pipeline?
1 project | /r/dataengineering | 13 Oct 2021
ETL Pipelines with Airflow: The Good, the Bad and the Ugly
7 projects | news.ycombinator.com | 8 Oct 2021

[dbt Labs employee here]
Check out dbt-expectations package[1]. It's a port of the Great Expectations checks to dbt as tests. The advantage of this is you don't need another tool for these pretty standard tests, and can be early incorporated into dbt workflows.
[1] https://github.com/calogica/dbt-expectations
Unit testing SQL in DBT
3 projects | /r/dataengineering | 6 Feb 2021

Also check out dbt-expectations that is a port of Great Expectations that greatly expands the configurable (non-assert) tests.

What are some alternatives?

When comparing hudi and dbt-expectations you can also consider the following projects:

iceberg - Apache Iceberg

dbt-utils - Utility functions for dbt projects.

kudu - Mirror of Apache Kudu

dbt-oracle - A dbt adapter for oracle db backend

Trino - Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

materialize - The data warehouse for operational workloads.

debezium - Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Scio - A Scala API for Apache Beam and Google Cloud Dataflow.

pinot - Apache Pinot - A realtime distributed OLAP datastore

NVTabular - NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

cuetils - CLI and library for diff, patch, and ETL operations on CUE, JSON, and Yaml

hudi vs iceberg dbt-expectations vs dbt-utils hudi vs kudu dbt-expectations vs dbt-oracle hudi vs Trino dbt-expectations vs materialize hudi vs debezium dbt-expectations vs Scio hudi vs pinot dbt-expectations vs NVTabular hudi vs delta dbt-expectations vs cuetils

Compare hudi vs dbt-expectations and see what are their differences.

hudi

dbt-expectations

hudi

dbt-expectations

What are some alternatives?