The sad state of property-based testing libraries

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Purpose built for real-time analytics at any scale.
InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • hedgehog

    Release with confidence, state-of-the-art property testing for Haskell.

    when realizing that those numbers don't come easily. [3]

    [1]: https://hackage.haskell.org/package/hedgehog-1.4/docs/src/He...

    [2]: https://gee.cs.oswego.edu/dl/papers/oopsla14.pdf

    [3]: https://github.com/hedgehogqa/haskell-hedgehog/issues/191

    Wonder why most property-testing libaries don't have features like this?

    The libraries require training to use. And they're not that easy to write.

    > the current state-of-the-art when it comes to property-based testing is stateful testing via a state machine model and reusing the same sequential state machine model combined with linearisability to achieve parallel testing

    Okay, okay. I admit I've never performed property-based stateful testing, nor in parallel. So that may be the coolest feature out there, because it addresses one of the hardest problems in testing.

    But I think that yet other things have happened with modern property-testing libraries (e.g. Hypothesis, PropEr, Hedgehog, Validity):

    Shrinking for free [4], generators for free [5], defining the probability distribution of your sub-generators in a composable way.

    Maybe those features are not as significant, but they're equally missing from almost all property-test libaries.

    [4]: Gens N’ Roses: Appetite for Reduction • Jacob Stanley • YOW! 2017 https://www.youtube.com/watch?v=LfD0DHqpeVQ

    [5]: https://tech.fpcomplete.com/blog/quickcheck-hedgehog-validit...

  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • coyote

    Coyote is a library and tool for testing concurrent C# code and deterministically reproducing bugs.

    I believe Java has exactly that, but I couldn't recall the project name.

    In the case of .NET, there is https://github.com/microsoft/coyote which works by rewriting IL to inject hooks that control concurrent execution state.

    It would have been much more expensive to have a custom interpreter specifically for this task (CoreCLR never interprets IL, it compiles it immediately to unoptimized machine code for fast startup, and then recompiles it one or multiple times as the method gets executed more and as JIT gathers its execution profile for PGO-driven compilation).

    This approach somewhat reminds me of precise release-mode debugging and tracing framework for C++ a friend of mine was talking about, which relies on either manually adding the hooks to source files or doing so automatically with a tool.

  • CsCheck

    Random testing library for C#

    For my C#/.NET testing I've been using CsCheck[0] and I've enjoyed it quite a bit. It was a lot more approachable compared to Hedgehog and FsCheck and is also quite fast.

    [0] https://github.com/AnthonyLloyd/CsCheck

  • buf-list

    A list of Rust buffers that implements the bytes::Buf trait.

    I write stateful property tests with Rust's proptest quite regularly, I just tend to handcode it which is quite straightforward. See https://github.com/sunshowers-code/buf-list/blob/main/src/cu... for a nontrivial example which found 6 bugs.

  • shuttle

    Shuttle is a library for testing concurrent Rust code (by awslabs)

    i.e. top level true randomness, then a bunch of nested loops (only 2 here, but some tests have more) to go from low-complexity cases to high-complexity

    then generate a seed to seed a deterministic PRNG, and print it out so if the test fails, I just copy and paste the error seed to replay the error case

    I have found doing this manual proptesting to be faster, more flexible, and generally less fuss than using any frameworks or libraries

    That said, for really robust concurrency testing, I cannot recommend enough the AWS Shuttle library (https://github.com/awslabs/shuttle) which can find insanely complicated race conditions. I wrote a little tutorial on it here: https://grantslatton.com/shuttle

    We used it at AWS to verify the custom filesystem we wrote to power AWS S3.

  • aa

    Immutable AA Trees (by ncruces)

    With the advent of coverage based fuzzing, and how well supported it is in Go, what am I missing from not using one of the property based testing libraries?

    https://www.tedinski.com/2018/12/11/fuzzing-and-property-tes...

    Like with the below fuzz test, and the corresponding invariant checks, is that equivalent to a property test?

    https://github.com/ncruces/aa/blob/505cbbf94973042cc7af4d6be...

    https://github.com/ncruces/aa/blob/505cbbf94973042cc7af4d6be...

  • libprotobuf-mutator

    Library for structured fuzzing with protobuffers

    [3]: https://github.com/google/libprotobuf-mutator

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • noblit

    An immutable append-only database

  • fsharp-hedgehog

    Release with confidence, state-of-the-art property testing for .NET.

    Haha yeah I kinda went off the deep end with applicatives. Here's a short primer on applicative vs monadic shrinking behavior using F# syntax https://github.com/hedgehogqa/fsharp-hedgehog/issues/419#iss...

    You can think of `let!` as `let + await`, and `let! x ... and! y` as `await Parallel([x, y])`.

    Please feel free to ask any questions if it's still confusing!

  • VisualFSharp

    The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio

    Not quite accurate with the Parallel example. Don Syme is explicit that applicative `async` should not implicitly start work in the thread pool (https://github.com/dotnet/fsharp/issues/10301#issuecomment-7...).

  • arbtest

    A minimalist property-based testing library

    Do you have a model problem which is tricky to shrink?

    I implemented a stupid simple shrinker for arbitrary, and I’d love to know a specific example where it fails to shrink in a good way:

    https://github.com/matklad/arbtest/blob/0191f93846e9f7e38254...

    I know at lest two interesting approaches for making that way smarter, but I don’t yet have a problem where my dumb approach isn’t sufficient.

  • hypothesis

    Hypothesis is a powerful, flexible, and easy to use library for property-based testing.

    Anecdotally, I had a fantastic experience with `clojure.spec.alpha` (with or without `test.check`), and when I went to use python's `hypothesis` it was just... abysmal.

    It seems like Hypothesis is unable to handle simple but "large" data sets >>by design<<, where "large" is really not so large. [0] It was such a pain that we ripped out Hypothesis (and generative testing altogether, sadly) completely from our python test suite at work.

    [0] https://github.com/HypothesisWorks/hypothesis/issues/3493

  • fuzzcheck-rs

    Modular, structure-aware, and feedback-driven fuzzing engine for Rust functions

    Agreed. A while back I played around with fuzzcheck [1], which let's you write coverage-guided, structure-aware property tests, but the generation is smarter than just slamming a fuzzer's `&[u8]` input into `Arbitrary`. It also supports shrinking, which is nice. Don't know that I would recommend it though. It seemed difficult to write your own `Mutator`s. It also looks somewhat unmaintained nowadays, but I think the direction is worth exploring.

    [1]: https://github.com/loiclec/fuzzcheck-rs/

  • sanitizers

    AddressSanitizer, ThreadSanitizer, MemorySanitizer

    I don't think the problem is primarily a lack of good OSS implementations or UX lacking polish. To me, it's more that there is a mismatch between what programming language/CS enthusiasts find interesting and useful, and what industry software developers find useful.

    First off, the "training" problem is a very real and fundamental blocker. I think academics or less experienced engineers see it as a problem of education, but I believe it's actually an engineering/organizational problem: for a company or even OSS project, when you use niche/unadopted tools like this you either need to provide training for it or you massively shrink the set of people able to work on your project by requiring or expecting them to learn it on their own. This introduces a practical headache bigger than the ones a tool like this solves (since these are just incrementally better ways to test software) - you get less contributions or need to spend more on getting people onboarded. Note that even if "training" people is free in the sense that there is no paid training material, you still pay a cost in lower productivty for new people and the time spent training.

    Even once people are trained, you now have a process/institutional need to support the property based tests you wrote. That may mean paying for licenses for software, or maintenance of whatever infrastructure/integrations you write to get property based testing working in your developer workflow. And you also now need to rely on people using these tools correctly - the problem with formal verification is that it doesn't verify that you're actually solving the real problem correctly, just that your program operates as expected (ie it only verifies that your software is Working As Implemented, not Working as Intended). The more you rely on average joes to wield complex things like this well, the more headaches you introduce - if the bottom 10% of people you hire use it sloppily or wrongly, once you're past a few dozen regular contributors you're basically going to have constant issues. You see this all the time with even just regular unit and integration tests - less capable developers constantly introduce flakes or write tests so bad/basic that it'd be better for them to not even be writing tests at all.

    Even if after considering all that it still seems a worthy tradeoff, there's the question of whether property based testing solves the problem you think it does. As I mentioned, it can moreso verify software is Working as Implemented rather than Working as Intended. But there is the bigger problem of major software projects not conforming to the simpler use cases that sell people on using this tool. In my experience stateless functions almost never have bugs, and when they do, it's mostly a "requirements discovery" problem you wouldn't catch with any kind of verification tooling.

    Stateful verification becomes a problem in the kinds of real systems programming projects like the Linux Kernel or Qemu which stand to benefit most from verification. If you're verifying things at the level of a "higher level" abstraction like a scheduler, you almost always have really high dimensionality/a lot of state, perhaps deceptively more than you think, because of composition. And you tend to also have a devloop that is already very slow - builds take a long time, and running tests of higher level components can be "expensive" because it may take 100ms-1s for something like a kernel to finish starting in a test environment. For a single test that's nothing, but multiplied across all tests makes testing painful enough already; adding 3x or 100x by doing property based testing with new tests, even if pruned before submitting, could be either hugely expensive or way too slow. And similarly, the increased resource requirements to maintain state machines can be very expensive. TSAN does essentially parallel property-based testing and introduces 128-512 bytes of overhead for every 8 bytes of memory under test: https://github.com/google/sanitizers/wiki/ThreadSanitizerAlg....

    Is it cool and underrated? Definitely. But it's not a silver bullet and honestly just isn't worth it for most software. Even when it is worth it, it has its drawbacks.

  • test.contract

    quickcheck of stateful protocols

    Clojure does have stateful quickcheck library now: https://github.com/griffinbank/test.contract

    Parallel testing is interesting, but hasn't been a large source of pain yet.

  • osv

    Discontinued Open source vulnerability DB and triage service. [Moved to: https://github.com/google/osv.dev] (by google)

    TLA+, Formal Methods in Python: FizzBee, Nagini, Deal-solver, Dafny: https://news.ycombinator.com/item?id=39938759 :

    > Python is Turing complete, but does [TLA,] need to be? Is there an in-Python syntax that can be expanded in place by tooling for pretty diffs; How much overlap between existing runtime check DbC decorators and these modeling primitives and feature extraction transforms should there be? (In order to: minimize cognitive overload for human review; sufficiently describe the domains, ranges, complexity costs, inconstant timings, and the necessary and also the possible outcomes given concurrency,)

    From "S2n-TLS – A C99 implementation of the TLS/SSL protocol" https://news.ycombinator.com/item?id=38510025 :

    > But formal methods (and TLA+ for distributed computation) don't eliminate side channels. [in CPUs e.g. with branch prediction, GPUs, TPUs/NPUs, Hypervisors, OS schedulers, IPC,]

    Still though, coverage-based fuzzing;

    From https://news.ycombinator.com/item?id=30786239 :

    > OSS-Fuzz runs CloudFuzz[Lite?] for many open source repos and feeds OSV OpenSSF Vulnerability Format: https://github.com/google/osv#current-data-sources

    From "Automated Unit Test Improvement using Large Language Models at Meta"

  • awesome-python-testing

    Collection of awesome 😎️ Python resources for testing

    https://news.ycombinator.com/item?id=39416628 :

    > "Fuzz target generation using LLMs" (2023), OSSF//fuzz-introspector*

    > gh topic: https://github.com/topics/coverage-guided-fuzzing

    The Fuzzing computational task is similar to the Genetic Algorithm computational task, in that both explore combinatorial Hilbert spaces of potentially infinite degree and thus there is need for parallelism and thus there is need for partitioning for distributed computation. (But there is no computational oracle to predict that any particular sequence of combinations of inputs under test will deterministically halt on any of the distributed workers, so second-order methods like gradient descent help to skip over apparently desolate territory when the error hasn't changed in awhile)

    The Fuzzing computational task: partition the set of all combinations of inputs for distributed execution with execute-once or consensus to resolve redundant results.

    DbC Design-By-Contract patterns include Preconditions and Postconditions (which include tests of Invariance)

    We test Preconditions to exclude Inputs that do not meet the specified Ranges, and we verify the Ranges of Outputs in Postconditions.

    We test Invariance to verify that there haven't been side-effects in other scopes; that variables and their attributes haven't changed after the function - the Command - returns.

    DbC: https://en.wikipedia.org/wiki/Design_by_contract :

    > Design by contract has its roots in work on formal verification, formal specification and Hoare logic.

    TLA+ > Language: https://en.wikipedia.org/wiki/TLA%2B#Language

    Formal verification: https://en.wikipedia.org/wiki/Formal_verification

    From https://news.ycombinator.com/item?id=38138319 :

    > Property testing: https://en.wikipedia.org/wiki/Property_testing

    > awesome-python-testing#property-based-testing: https://github.com/cleder/awesome-python-testing#property-ba...

    > Fuzzing: https://en.wikipedia.org/wiki/Fuzzing

    Software testing > Categorization > [..., Property testing, Metamorphic testing] https://en.wikipedia.org/wiki/Software_testing#Categorizatio...

    --

    Controlled randomness: tests of randomness, random uniform not random norm, rngd, tests of randomness:

    From https://news.ycombinator.com/item?id=40630177 :

    > google/paranoid_crypto.lib.randomness_tests: https://github.com/google/paranoid_crypto/tree/main/paranoid...

  • paranoid_crypto

    Paranoid's library contains implementations of checks for well known weaknesses on cryptographic artifacts.

    https://news.ycombinator.com/item?id=39416628 :

    > "Fuzz target generation using LLMs" (2023), OSSF//fuzz-introspector*

    > gh topic: https://github.com/topics/coverage-guided-fuzzing

    The Fuzzing computational task is similar to the Genetic Algorithm computational task, in that both explore combinatorial Hilbert spaces of potentially infinite degree and thus there is need for parallelism and thus there is need for partitioning for distributed computation. (But there is no computational oracle to predict that any particular sequence of combinations of inputs under test will deterministically halt on any of the distributed workers, so second-order methods like gradient descent help to skip over apparently desolate territory when the error hasn't changed in awhile)

    The Fuzzing computational task: partition the set of all combinations of inputs for distributed execution with execute-once or consensus to resolve redundant results.

    DbC Design-By-Contract patterns include Preconditions and Postconditions (which include tests of Invariance)

    We test Preconditions to exclude Inputs that do not meet the specified Ranges, and we verify the Ranges of Outputs in Postconditions.

    We test Invariance to verify that there haven't been side-effects in other scopes; that variables and their attributes haven't changed after the function - the Command - returns.

    DbC: https://en.wikipedia.org/wiki/Design_by_contract :

    > Design by contract has its roots in work on formal verification, formal specification and Hoare logic.

    TLA+ > Language: https://en.wikipedia.org/wiki/TLA%2B#Language

    Formal verification: https://en.wikipedia.org/wiki/Formal_verification

    From https://news.ycombinator.com/item?id=38138319 :

    > Property testing: https://en.wikipedia.org/wiki/Property_testing

    > awesome-python-testing#property-based-testing: https://github.com/cleder/awesome-python-testing#property-ba...

    > Fuzzing: https://en.wikipedia.org/wiki/Fuzzing

    Software testing > Categorization > [..., Property testing, Metamorphic testing] https://en.wikipedia.org/wiki/Software_testing#Categorizatio...

    --

    Controlled randomness: tests of randomness, random uniform not random norm, rngd, tests of randomness:

    From https://news.ycombinator.com/item?id=40630177 :

    > google/paranoid_crypto.lib.randomness_tests: https://github.com/google/paranoid_crypto/tree/main/paranoid...

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • CS 6120: Advanced Compilers: The Self-Guided Online Course

    1 project | news.ycombinator.com | 3 Mar 2024
  • Property based testing in Go

    1 project | dev.to | 1 Mar 2024
  • Hypothesis

    1 project | news.ycombinator.com | 1 Feb 2024
  • Prefer table driven tests (2019)

    6 projects | news.ycombinator.com | 1 Feb 2024
  • The 5 principles of Unit Testing

    2 projects | dev.to | 14 Sep 2023