Run e2e tests 10x faster using firecracker VMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • Buildkite

    The Buildkite Agent is an open-source toolkit written in Go for securely running build jobs on any device or network (by buildkite)

  • A few issues I have with this blog post:

    1. It doesn't show off the unique capabilities of firecracker very well.

    2. The comparison not very fair fair.

    2a. GitHub Action is run without any caching, just by adding 2 lines to your build-push-action, "cache-from: type=gha, cache-to: type=gha,mode=max" you can make it a lot faster.

    2b. ~1m20s of the time is just "VM start". GitHub Actions has had a rough time recently, but you should never wait that long to get your CI running in day-to-day operation.

    2c. The tests are unrealistically short at 20s which allows the author to get to their 10x faster number.

    Let's say the GitHub Action starts in 5 seconds, the GitHub Actions cache reduces the build time to 1 minute and the tests take 10 minutes to run. Now Firecracker is 10% faster ...

    You can also get comparable performance out of https://buildkite.com/ which lets you self-host runners on AWS meaning you're almost guaranteed to get a hot docker cache (running against locally attached SSDs), which means you can start running your tests (almost) as fast with much more mature tooling.

  • flyctl

    Command line tools for fly.io services

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Bazel

    a fast, scalable, multi-language and extensible build system

  • > Why do you need to snapshot live processes?

    Often times there are long-living processes which rarely change but take a long time to warm up. The Bazel [1] agent for C++ projects, the buildkit [2] state for docker, or the running Postgres or Redis server for a cloud native app for example.

    It's why running "docker build" twice on your laptop is so fast, but running "docker build" in CI seems glacially slow.

    > why is docker-in-docker a requirement, and how is that easier than qemu in qemu or qemu in docker or whatever?

    The example given was running "docker-compose build", so you'd need either docker-in-firecracker (this post), docker-in-docker, or docker-in-qemu. You'd almost never run docker-compose build on bare metal in practice, because you'd immediately need to push the images you built somewhere to use them.

    [1] https://bazel.build/

  • livechat-example

  • I tried to get docker layer caching working within GHA for a second benchmark, but it seems like none of the approaches work particularly well for a "docker-compose build" - I'd happily amend the post with a second benchmark if you wouldn't mind opening a PR based on the existing one [1]

    https://github.com/webappio/livechat-example/blob/be7c9121c1...

    The point still stands for 2c - you can super easily parallelize with firecracker (by taking a snapshot of tge state right before the test runs, then loading it a bunch of times)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts