Unit & integration testing in Databricks

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/dataengineering

Our great sponsors
  • Sonar - Write Clean Python Code. Always.
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • SaaSHub - Software Alternatives and Reviews
  • dbx

    🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

    Hey, Databricks person here. Check out DBX for a template on how to do unit and integration tests: https://github.com/databrickslabs/dbx

  • spark-fast-tests

    Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

    If the majority of your stuff is not UDF-based there is an OS solution to run assertion tests against full data frames called spark-fast-tests. The idea here is similar in that you have a it notebook that calls your actual notebook against a staged input reads the output and compares it to a prefabed expected output. This does take a bit of setup and trial and error but it’s the closest I’ve been able to get to proper automated regression testing in databricks

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • databricks-nutter-projects-demo

    Demo of using the Nutter for testing of Databricks notebooks in the CI/CD pipeline [Moved to: https://github.com/alexott/databricks-nutter-repos-demo]

    You can also use the approach described here https://github.com/alexott/databricks-nutter-projects-demo

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts