Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Gleb, Alex, Erez and Simon here – we are building an open-source tool for comparing data within and across databases at any scale. The repo is at https://github.com/datafold/data-diff, and our home page is https://datafold.com/.
As a company, Datafold builds tools for data engineers to automate the most tedious and error-prone tasks falling through the cracks of the modern data stack, such as data testing and lineage. We launched two years ago with a tool for regression-testing changes to ETL code https://news.ycombinator.com/item?id=24071955. It compares the produced data before and after the code change and shows the impact on values, aggregate metrics, and downstream data applications.
While working with many customers on improving their data engineering experience, we kept hearing that they needed to diff their data across databases to validate data replication between systems.
There were 3 main use cases for such replication:
* To perform analytics on transactional data in an OLAP engine (e.g. PostgreSQL > Snowflake)
Related posts
- Data-diff v0.3: DuckDB, efficient in-database diffing and more
-
data-diff VS cuallee - a user suggested alternative
2 projects | 30 Nov 2022
- Compare identical tables across databases to identify data differences (Oracle 19c)
- How to test Data Ingestion Pipeline
- Data migration - easier way to compare legacy with new environment?