How do you test your pipelines?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

monosi

20 320 0.0 Python

Open source data observability platform

As mentioned in other comments, dbt tests are one way to go about it, usually hooked up to Airflow or some other scheduler. There’s also an open source package being actively built out for monitoring data quality and validating some of the parameters you described - https://github.com/monosidev/monosi
soda-spark

1 60 0.0 Python

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Since you already have Spark setup, perhaps it would be easier to build a DataFrames by loading data from different tables and validate it in one go ? You can give soda-spark a try (disclosure: I'm one of the developers), using which you can specify your checks using YAML declaratively and run the validations in spark jobs.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
soda-sql

25 50 8.2 Python

Discontinued Data profiling, testing, and monitoring for SQL accessible data.

You can also use soda-sql to do checks on your warehouses separately. Both Soda SQL and Soda Spark are OSS/Apache licensed.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project