Automated Unit Test Improvement Using Large Language Models at Meta

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • tunnelmole-client

    Tunnelmole - Connect to local servers from anywhere

  • Not sure about improving. But for Tunnelmole (https://github.com/robbie-cahill/tunnelmole-client) I have used GPT-4 to generate unit tests just by showing it a TypeScript module and asking it to create tests.

  • syzkaller

    syzkaller is an unsupervised coverage-guided kernel fuzzer

  • https://arxiv.org/abs/2402.09171 :

    > This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. [...] We believe this is the first report on industrial scale deployment of LLM-generated code backed by such assurances of code improvement.

    Coverage-guided unit test improvement might [with LLMs] be efficient too.

    https://github.com/topics/coverage-guided-fuzzing :

    - e.g. Google/syzkaller is a coverage-guided syscall fuzzer: https://github.com/google/syzkaller

    - Gitlab CI supports coverage-guided fuzzing: https://docs.gitlab.com/ee/user/application_security/coverag...

    - oss-fuzz, osv

    Additional ways to improve tests:

    Hypothesis and pynguin generate tests from type annotations.

    There are various tools to generate type annotations for Python code;

    > pytype (Google) [1], PyAnnotate (Dropbox) [2], and MonkeyType (Instagram) [3] all do dynamic / runtime PEP-484 type annotation type inference [4] to generate type annotations. https://news.ycombinator.com/item?id=39139198

    icontract-hypothesis generates tests from icontract DbC Design by Contract type, value, and invariance constraints specified as precondition and postcondition @decorators:

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • https://google.github.io/oss-fuzz/research/llms/target_gener... https://security.googleblog.com/2023/08/ai-powered-fuzzing-b... https://hn.algolia.com/?q=AI-Powered+Fuzzing%3A+Breaking+the...

    OSSF//fuzz-introspector//doc/Features.md: https://github.com/ossf/fuzz-introspector/blob/main/doc/Feat...

    https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=Fuz... :

    - "Large Language Models Based Fuzzing Techniques: A Survey" (2024) https://arxiv.org/abs/2402.00350 :

  • icontract-hypothesis

    Combine contracts and automatic testing.

  • https://github.com/mristin/icontract-hypothesis

    Nagini and deal-solver attempt to Formally Verify Python code with or without unit tests: https://news.ycombinator.com/item?id=39139198

    Additional research:

    "Fuzz target generation using LLMs" (2023)

  • fuzz-introspector

    Fuzz Introspector -- introspect, extend and optimise fuzzers

  • https://google.github.io/oss-fuzz/research/llms/target_gener... https://security.googleblog.com/2023/08/ai-powered-fuzzing-b... https://hn.algolia.com/?q=AI-Powered+Fuzzing%3A+Breaking+the...

    OSSF//fuzz-introspector//doc/Features.md: https://github.com/ossf/fuzz-introspector/blob/main/doc/Feat...

    https://scholar.google.com/scholar?hl=en&as_sdt=0%2C43&q=Fuz... :

    - "Large Language Models Based Fuzzing Techniques: A Survey" (2024) https://arxiv.org/abs/2402.00350 :

  • AlphaCodium

    Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

  • Thanks for sharing this. By far the best tool I've seen in the market centered around Code Integrity is CodiumAI (https://www.codium.ai/). They generate unit test based on entire code repos. Also integrates into SDLC through a PR Agent on GitHub or GitLab. My whole team uses them.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts