Ask HN: How do you test SQL?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pg-mem

13 1,779 7.0 TypeScript

An in memory postgres DB instance for your unit tests

I was wondering the other day how to classify tests that use a test double like pg-mem, which isn't a mock but isn't the Dockerized test DB either :
https://github.com/oguimbal/pg-mem
hash-db

5 50 0.0 Python

Experimental distributed pseudomultimodel keyvalue database (it uses python dictionaries) imitating dynamodb querying with join only SQL support, distributed joins and simple Cypher graph support and document storage

From an SQL database implementation perspective, in my toy Python barebones SQL database that barely supports inner joins (https://github.com/samsquire/hash-db) I tested by testing on postgresql and seeing if my query with two joins produces the same results.
I ought to produce unit tests that prove that tuples from each join operation produces the correct dataset.
For a user perspective, I guess you could write some tooling that loads example data into a database and does an incremental join with each part of the join statement added.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
datajudge

1 38 8.2 Python

Assessing whether data from database complies with reference information.

https://github.com/QuantCo/datajudge
We've also written a blog post trying to illustrate a use case:
dbt-unit-testing

7 402 7.7 Shell

This dbt package contains macros to support unit testing that can be (re)used across dbt projects.

We use this and take an example-based tests approach for any non-trivial tables: https://github.com/EqualExperts/dbt-unit-testing
testcontainers-dotnet

16 3,506 9.0 C#

A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.

.NET Shop using SQL Server here, but I think something similar to what we do can apply to any stack. We use TestContainers [1] to spin up a container with SQL Server engine running on it. Then use FluentMigrator [2] to provision tables and test data to run XUnit integration tests against. This has worked remarkably well.
[1] https://dotnet.testcontainers.org/
graphjin

61 2,837 5.9 Go

GraphJin - Build NodeJS / GO APIs in 5 minutes not weeks

In GraphJin an automatic GraphQL to SQL compiler we use the gnomock library it startups a database instance (docker) then create the schema and tests data and finally our code connects to it and runs a series of tests. We run these across Mysql, Postgres and a few other DB's. Gnomock supports a wide range of them. Right now we don't take down the db for every test only between test runs but its fast enough that we could. This whole thing runs of a simple `go test -v .` command and we run it on every commit using a githook. https://github.com/dosco/graphjin/blob/master/tests/dbint_te...
Flyway

80 7,754 7.2 Java

Flyway by Redgate • Database Migrations Made Easy.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sqlfluff

35 7,189 9.6 Python

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

This linter can really enforce some best practices https://github.com/sqlfluff/sqlfluff
A list of best practices:
sqlx

144 11,678 9.2 Rust

🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite. (by launchbadge)

Part of the solution could be using tooling which can compile time check SQL is valid like https://github.com/launchbadge/sqlx
data-diff

20 2,824 9.6 Python

Compare tables within or across databases

I did data engineering for 6 years and am building a company to automate SQL validation for dbt users.
First, by “testing SQL pipelines”, I assume you mean testing changes to SQL code as part of the development workflow? (vs. monitoring pipelines in production for failures / anomalies).
If so:
1 – assertions. dbt comes with a solid built-in testing framework [1] for expressing assertions such as “this column should have values in the list [A,B,C]” as well checking referential integrity, uniqueness, nulls, etc. There are more advanced packages on top of dbt tests [2]. The problem with assertion testing in general though is that for a moderately complex data pipeline, it’s infeasible to achieve test coverage that would cover most possible failure scenarios.
2 – data diff: for every change to SQL, know exactly how the code change affects the output data by comparing the data in dev/staging (built off the dev branch code) with the data in production (built off the main branch). We built an open-source tool for that: https://github.com/datafold/data-diff, and we are adding an integration with dbt soon which will make diffing as part of dbt development workflow one command away [2]
We make money by selling a Cloud solution for teams that integrates data diff into Github/Gitlab CI and automatically diffs every pull request to tell you the how a change to SQL affects the target table you changed, downstream tables and dependent BI tools (video demo: [3])
I’ve also written about why reliable change management is so important for data engineering and what are key best practices to implement [4]
[1] https://docs.getdbt.com/docs/build/tests
fugue

11 1,869 6.7 Python

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
pg_temp

1 11 10.0 Python

create a temporary, disposable, userland pg database

Here's a nifty python module for testing against postgres that a friend wrote. It creates a temporary database on the fly for testing and tears it down at the end. You can then populate the database and run all kinds of integration tests against it.
https://github.com/ugtar/pg_temp
integresql

5 698 8.9 Go

IntegreSQL manages isolated PostgreSQL databases for your integration tests.

Happy to hear that! When it comes to testing services that depend on PostgreSQL, this is still my preferred solution.
https://github.com/allaboutapps/integresql
disclaimer: author
squire

1 24 10.0 Go

On a related note (though it does touch upon testing) mitchellh open sourced Squire[1] recently (today?). Note though, that it's PostgreSQL specific.
[1] https://github.com/mitchellh/squire
SS-Unit

1 8 10.0 TSQL

A 100% T-SQL based unit testing framework for SQL Server

Back in the mid-noughties I decided to see if I could write SQL in a test-first manner (i.e. TDD). This resulted in me writing a 100% T-SQL based unit testing framework for SQL Server [1] which we then used for the SQL back-end at an investment bank.
On the back on that professional use I wrote a blog post [2] explaining why you might choose to go down this route as it wasn't the way database was developed way back then (SQL wasn't developed in the same way as the other front-end and back-end code).
A few years later I gave a short 20-minute talk (videoed) to show what writing SQL using TDD looked like for me. It's hard to show all the kinds of tests we wrote in practice at the bank but the talk is intended to show how rapid the feedback loop can be using a standard DB query tool and two code windows - production code and tests.
Be kind, it was a long time ago and I'm sure the state of the art has improved a lot in the intervening years :o).
Chris Oldwood
---
[1] SQL Server Unit: https://github.com/chrisoldwood/SS-Unit
[2] You Write Your SQL Unit Tests in SQL?: https://chrisoldwood.blogspot.com/2011/04/you-write-your-sql...
[3] Test-Driven SQL: https://www.youtube.com/watch?v=5-MWYKLM3r0
spark-style-guide

2 248 0.0 Jupyter Notebook

Spark style guide

Spark makes it easy to wrap SQL in functions that are easy to test. I am the author of the popular Scala Spark (spark-fast-tests) and PySpark (chispa) testing libraries. Some additional tips to speed up Spark tests (can speed up tests between 70-90%):
* reuse the same Spark session throughout the test suite
* Set shuffle partitions to 2 (instead of default which is 200)
* Use dependency injection to avoid disk I/O in the test suite
* Use fast DataFrame equality when possible. assertSmallDataFrameEquality is 4x faster than assertLargeDataFrameEquality.
* Use column equality to test column functions. Don't compare DataFrames unless you're testing custom DataFrame transformations. See the spark-style-guide for definitions for these terms: https://github.com/MrPowers/spark-style-guide/blob/main/PYSP...
Spark is an underrated tool for testing SQL. Spark makes it really easy to abstract SQL into unit testable chunks. Configuring your tests properly takes some knowledge, but you can make the tests run relatively quickly.
bytebase

36 9,905 10.0 Go

The GitLab/GitHub for database DevOps. World's most advanced database DevOps and CI/CD for Developer, DBA and Platform Engineering teams.

We built a GUI based solution called https://github.com/bytebase/bytebase. You may also take a look.
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

We built our customer data warehouse all on Postgres
16 projects | news.ycombinator.com | 2 Feb 2024
Is ORM still an anti-pattern?
15 projects | news.ycombinator.com | 27 Jun 2023
Sketch of a Post-ORM
14 projects | news.ycombinator.com | 2 Jun 2023
New to PostgreSQL - Best way to use it?
6 projects | /r/PostgreSQL | 25 Jan 2022
Ask HN: How Can I Make My Front End React to Database Changes in Real-Time?
3 projects | news.ycombinator.com | 17 Apr 2024

Ask HN: How do you test SQL?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Postgresql SQL Database MySQL Postgres
Post date: 31 Jan 2023

pg-mem

hash-db

InfluxDB

datajudge

dbt-unit-testing

testcontainers-dotnet

graphjin

Flyway

WorkOS

sqlfluff

sqlx

data-diff

fugue

pg_temp

integresql

squire

SS-Unit

spark-style-guide

bytebase

SaaSHub

Related posts

Ask HN: How do you test SQL?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Postgresql SQL Database MySQL Postgres Post date: 31 Jan 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Postgresql SQL Database MySQL Postgres
Post date: 31 Jan 2023