Unit & integration testing in Databricks

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • dbx

    🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

  • Hey, Databricks person here. Check out DBX for a template on how to do unit and integration tests: https://github.com/databrickslabs/dbx

  • spark-fast-tests

    Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

  • If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • databricks-nutter-projects-demo

    Discontinued Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline [Moved to: https://github.com/alexott/databricks-nutter-repos-demo]

  • You can also use the approach described here https://github.com/alexott/databricks-nutter-projects-demo

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts