Elfshaker: GiB – 100 MiB, with 1s access time

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • SonarLint - Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
  • Scout APM - Less time debugging, more time building
  • SaaSHub - Software Alternatives and Reviews
  • elfshaker

    elfshaker stores binary objects efficiently

    We've just added an applicability section, which explains a bit more what we do. We don't have any ELF specific heuristics.


    In summary, for manyclangs, we compile with -ffunction-sections and -fdata-sections, and store the resulting object files. These are fairly robust to insertions and deletions, since the addresses are section relative, so the damage of any addresses changing is contained within the sections. A somewhat surprising thing is that this works well enough when building many revisions of clang/llvm -- as you go from commit to commit, many commits have bit identical object files, even though the build system often wants to rebuild them because some input has changed.

    elfshaker packs use a heuristic of sorting all unique objects by size, before concatenating them and storing them with zstandard. This gives us an amortized cost-per-commit of something like 40kiB after compression with zstandard.

  • manyclangs

    Repository hosting unofficial binary pack files for many commits of LLVM

    Author here. elfshaker itself does not have a dependency on any architecture to our knowledge. We support the architectures we have immediate use of.

    manyclangs provides binary pack files for aarch64 because that's what we have immediate use of. If elfshaker and manyclangs proves useful to people, I would love to see resource invested to make it more widely useful.

    You can still run the manyclangs binaries on other architectures using qemu [0], with some performance cost, which may be tolerable depending on your use case.

    [0] https://github.com/elfshaker/manyclangs/tree/main/docker-qem...

  • SonarLint

    Deliver Cleaner and Safer Code - Right in Your IDE of Choice!. SonarLint is a free and open source IDE extension that identifies and catches bugs and vulnerabilities as you code, directly in the IDE. Install from your favorite IDE marketplace today.

  • nixpkgs

    Nix Packages collection

    I experimented with something similar with a Linux distribution's package binary cache.

    Using `bup` (deduplicating backup tool using git packfile format) I deduplicated 4 Chromium builds into the size of 1.

    Large download/storage requirements for updates are one of NixOS's few drawbacks, and I think deduplication could solve that pretty much completely.

    Details: https://github.com/NixOS/nixpkgs/issues/89380

  • dwarfs

    A fast high compression read-only file system

    Somewhat related (and definitely born out of a very similar use case): https://github.com/mhx/dwarfs

    I initially built this for having access to 1000+ Perl installations (spanning decades of Perl releases). The compression in this case is not quite as impressive (50 GiB to around 300 MiB), but access times are typically in the millisecond region.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts