Ask HN: How do you test SQL?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • pg-mem

    An in memory postgres DB instance for your unit tests

  • I was wondering the other day how to classify tests that use a test double like pg-mem, which isn't a mock but isn't the Dockerized test DB either :

    https://github.com/oguimbal/pg-mem

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • hash-db

    Experimental distributed pseudomultimodel keyvalue database (it uses python dictionaries) imitating dynamodb querying with join only SQL support, distributed joins and simple Cypher graph support and document storage

  • From an SQL database implementation perspective, in my toy Python barebones SQL database that barely supports inner joins (https://github.com/samsquire/hash-db) I tested by testing on postgresql and seeing if my query with two joins produces the same results.

    I ought to produce unit tests that prove that tuples from each join operation produces the correct dataset.

    For a user perspective, I guess you could write some tooling that loads example data into a database and does an incremental join with each part of the join statement added.

  • datajudge

    Assessing whether data from database complies with reference information.

  • https://github.com/QuantCo/datajudge

    We've also written a blog post trying to illustrate a use case:

  • dbt-unit-testing

    This dbt package contains macros to support unit testing that can be (re)used across dbt projects.

  • We use this and take an example-based tests approach for any non-trivial tables: https://github.com/EqualExperts/dbt-unit-testing

  • testcontainers-dotnet

    A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.

  • .NET Shop using SQL Server here, but I think something similar to what we do can apply to any stack. We use TestContainers [1] to spin up a container with SQL Server engine running on it. Then use FluentMigrator [2] to provision tables and test data to run XUnit integration tests against. This has worked remarkably well.

    [1] https://dotnet.testcontainers.org/

  • graphjin

    GraphJin - Build NodeJS / GO APIs in 5 minutes not weeks

  • In GraphJin an automatic GraphQL to SQL compiler we use the gnomock library it startups a database instance (docker) then create the schema and tests data and finally our code connects to it and runs a series of tests. We run these across Mysql, Postgres and a few other DB's. Gnomock supports a wide range of them. Right now we don't take down the db for every test only between test runs but its fast enough that we could. This whole thing runs of a simple `go test -v .` command and we run it on every commit using a githook. https://github.com/dosco/graphjin/blob/master/tests/dbint_te...

  • Flyway

    Flyway by Redgate • Database Migrations Made Easy.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sqlfluff

    A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

  • This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff

    A list of best practices:

  • sqlx

    🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite. (by launchbadge)

  • Part of the solution could be using tooling which can compile time check SQL is valid like https://github.com/launchbadge/sqlx

  • data-diff

    Discontinued Compare tables within or across databases

  • I did data engineering for 6 years and am building a company to automate SQL validation for dbt users.

    First, by “testing SQL pipelines”, I assume you mean testing changes to SQL code as part of the development workflow? (vs. monitoring pipelines in production for failures / anomalies).

    If so:

    1 – assertions. dbt comes with a solid built-in testing framework [1] for expressing assertions such as “this column should have values in the list [A,B,C]” as well checking referential integrity, uniqueness, nulls, etc. There are more advanced packages on top of dbt tests [2]. The problem with assertion testing in general though is that for a moderately complex data pipeline, it’s infeasible to achieve test coverage that would cover most possible failure scenarios.

    2 – data diff: for every change to SQL, know exactly how the code change affects the output data by comparing the data in dev/staging (built off the dev branch code) with the data in production (built off the main branch). We built an open-source tool for that: https://github.com/datafold/data-diff, and we are adding an integration with dbt soon which will make diffing as part of dbt development workflow one command away [2]

    We make money by selling a Cloud solution for teams that integrates data diff into Github/Gitlab CI and automatically diffs every pull request to tell you the how a change to SQL affects the target table you changed, downstream tables and dependent BI tools (video demo: [3])

    I’ve also written about why reliable change management is so important for data engineering and what are key best practices to implement [4]

    [1] https://docs.getdbt.com/docs/build/tests

  • fugue

    A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

  • pg_temp

    create a temporary, disposable, userland pg database

  • Here's a nifty python module for testing against postgres that a friend wrote. It creates a temporary database on the fly for testing and tears it down at the end. You can then populate the database and run all kinds of integration tests against it.

    https://github.com/ugtar/pg_temp

  • integresql

    IntegreSQL manages isolated PostgreSQL databases for your integration tests.

  • Happy to hear that! When it comes to testing services that depend on PostgreSQL, this is still my preferred solution.

    https://github.com/allaboutapps/integresql

    disclaimer: author

  • squire

  • On a related note (though it does touch upon testing) mitchellh open sourced Squire[1] recently (today?). Note though, that it's PostgreSQL specific.

    [1] https://github.com/mitchellh/squire

  • SS-Unit

    A 100% T-SQL based unit testing framework for SQL Server

  • Back in the mid-noughties I decided to see if I could write SQL in a test-first manner (i.e. TDD). This resulted in me writing a 100% T-SQL based unit testing framework for SQL Server [1] which we then used for the SQL back-end at an investment bank.

    On the back on that professional use I wrote a blog post [2] explaining why you might choose to go down this route as it wasn't the way database was developed way back then (SQL wasn't developed in the same way as the other front-end and back-end code).

    A few years later I gave a short 20-minute talk (videoed) to show what writing SQL using TDD looked like for me. It's hard to show all the kinds of tests we wrote in practice at the bank but the talk is intended to show how rapid the feedback loop can be using a standard DB query tool and two code windows - production code and tests.

    Be kind, it was a long time ago and I'm sure the state of the art has improved a lot in the intervening years :o).

    Chris Oldwood

    ---

    [1] SQL Server Unit: https://github.com/chrisoldwood/SS-Unit

    [2] You Write Your SQL Unit Tests in SQL?: https://chrisoldwood.blogspot.com/2011/04/you-write-your-sql...

    [3] Test-Driven SQL: https://www.youtube.com/watch?v=5-MWYKLM3r0

  • spark-style-guide

    Spark style guide

  • Spark makes it easy to wrap SQL in functions that are easy to test. I am the author of the popular Scala Spark (spark-fast-tests) and PySpark (chispa) testing libraries. Some additional tips to speed up Spark tests (can speed up tests between 70-90%):

    * reuse the same Spark session throughout the test suite

    * Set shuffle partitions to 2 (instead of default which is 200)

    * Use dependency injection to avoid disk I/O in the test suite

    * Use fast DataFrame equality when possible. assertSmallDataFrameEquality is 4x faster than assertLargeDataFrameEquality.

    * Use column equality to test column functions. Don't compare DataFrames unless you're testing custom DataFrame transformations. See the spark-style-guide for definitions for these terms: https://github.com/MrPowers/spark-style-guide/blob/main/PYSP...

    Spark is an underrated tool for testing SQL. Spark makes it really easy to abstract SQL into unit testable chunks. Configuring your tests properly takes some knowledge, but you can make the tests run relatively quickly.

  • bytebase

    The GitHub/GitLab for database DevOps. World's most advanced database DevOps and CI/CD for Developer, DBA and Platform Engineering teams.

  • We built a GUI based solution called https://github.com/bytebase/bytebase. You may also take a look.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts