Is My Package Reproducible Yet?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • svntogit-community

    Discontinued Automatic import of svn 'community' repo (read-only mirror)

    > Even if somebody links a package to their Git repo, who knows what commit correlates to a given release? Even if somebody pushes a "tag" for the release, there is still no guarantee that it's actually the same code

    That's not what reproducible means in this context. Being able to go from a given release artifact to a git repo is not what reproducibility is about.

    Let's look at an example: docker is written in go, and is marked as reproducible in debian, but unreproducible in archlinux: https://ismypackagereproducibleyet.org/?pkg=docker&query=que... (named docker.io on debian https://ismypackagereproducibleyet.org/?pkg=docker.io&query=...)

    Why is that?

    Well, debian sets the "version.BuildTime" option when it compiles it to a constant value each time it compiles it: https://salsa.debian.org/go-team/packages/docker/-/blob/4f1f...

    Archlinux's PKGBUILD doesn't https://github.com/archlinux/svntogit-community/blob/cf04bef...

    Because of that, "docker version" on arch linux outputs that it was built on the actual day it was built, whenever that was, and each time it's built that string changes. Hence, not reproducible.

    Reproducible builds are about capturing enough information about how something is built, and specific enough build instructions, that two people can produce the same bit-for-bit identical output, and therefore verify that someone didn't compile a malicious backdoored binary, and also increase the chance that each build consistently works or doesn't work, regardless of who compiles it when.

  • bob

    Bob is a high-level build tool for multi-language projects. (by benchkram)

    Using Nix[0] should solve the reproducibility of a package. We use its package manager for bob[1] to achieve reproducible builds for projects.

    [0] https://nixos.org/guides/how-nix-works.html

    [1] https://bob.build/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • reproducible-central

    Reproducible Central: rebuild instructions for artifacts published to (Maven) Central Repository

    It's pretty doable to create reproducible builds with gradle or maven: https://reproducible-builds.org/docs/jvm/

    Also, releases on maven central typically include a sources.jar along with the binaries. It's required to provide that as well as javadoc to publish on maven central, I think. They also require jars to be signed and there are some manual checks before they activate your account to verify e.g. domain ownership and metadata of your project. It's not perfect but it's better than what many other packagemanagers do. It's actually a bit of a PITA to setup; I've wasted quite a bit of time getting some stuff published there just trying to figure out their convoluted processes and tools and error messages. This stuff is way too hard in it's current form.

    Not all projects that use these build tools are fully reproducible but that is probably pretty easy to fix if people raise awareness of issues around this topic.

    There's even a website that tracks reproducibility for some common libraries on maven central: https://github.com/jvm-repo-rebuild/reproducible-central#rea...

    If you want to use artififacts produced straight from git hashes, tags, or branches, jitpack.io is pretty neat.

  • Moby

    The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

    > Because of that, "docker version" on arch linux outputs that it was built on the actual day it was built, whenever that was, and each time it's built that string changes. Hence, not reproducible.

    Fwiw, I'm the maintainer of Docker on Arch and this is a bit more complicated.

    Arch doesn't need to care about build time because `pacman` sets `SOURCE_DATE_EPOCH` during package building which is then utilized by the docker build system.

    https://github.com/moby/moby/commit/760763e9957840f1983a5006...

    If you look at the diff from an actual reproduction of the `docker` package (the Debian Reproducible Builds CI is a fuzzer), you will see there is no issues around build time.

    https://reproducible.archlinux.org/api/v0/builds/359851/diff...

    What you do see is some weird differences around `NT_GNU_BUILD_ID` and `GO BUILDID`.

    This is because of `lto`. In Arch we try to build most Go packages with cgo for the purpose of utilizing hardening flags. The issue is that the C code generation in Golang isn't actually reproducible with `lto` enabled.

    One issue is what I submitted here; https://github.com/golang/go/pull/53528

    Another issue which I have been trying to debug is why the code snippet located here doesn't reproduce.

    https://pub.linderud.dev/cg/

    This results in the BUILDID generated and sat by the cgo build process isn't reproducible.

    I could disable lto and all the binary hardening and docker would probably be reproducible, but where is the fun in that :)?

  • go

    The Go programming language

    > Because of that, "docker version" on arch linux outputs that it was built on the actual day it was built, whenever that was, and each time it's built that string changes. Hence, not reproducible.

    Fwiw, I'm the maintainer of Docker on Arch and this is a bit more complicated.

    Arch doesn't need to care about build time because `pacman` sets `SOURCE_DATE_EPOCH` during package building which is then utilized by the docker build system.

    https://github.com/moby/moby/commit/760763e9957840f1983a5006...

    If you look at the diff from an actual reproduction of the `docker` package (the Debian Reproducible Builds CI is a fuzzer), you will see there is no issues around build time.

    https://reproducible.archlinux.org/api/v0/builds/359851/diff...

    What you do see is some weird differences around `NT_GNU_BUILD_ID` and `GO BUILDID`.

    This is because of `lto`. In Arch we try to build most Go packages with cgo for the purpose of utilizing hardening flags. The issue is that the C code generation in Golang isn't actually reproducible with `lto` enabled.

    One issue is what I submitted here; https://github.com/golang/go/pull/53528

    Another issue which I have been trying to debug is why the code snippet located here doesn't reproduce.

    https://pub.linderud.dev/cg/

    This results in the BUILDID generated and sat by the cgo build process isn't reproducible.

    I could disable lto and all the binary hardening and docker would probably be reproducible, but where is the fun in that :)?

  • slsa

    Supply-chain Levels for Software Artifacts

    Maybe something like SLSA could solve your issue https://slsa.dev/ the idea is to create a metadata standard so that packages can provide build providence information. So public build systems like github actions can sign a manifest explaining what build actions were executed and what the input and output of those actions are. That way you could verify a package was built from a specific set of sources with a specific set of build actions without requiring reproducibility. This assumes you trust the public build system not to lie though.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts