Elfshaker: GiB – 100 MiB, with 1s access time

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

elfshaker

12 2,290 7.9 Rust

elfshaker stores binary objects efficiently

We've just added an applicability section, which explains a bit more what we do. We don't have any ELF specific heuristics.
https://github.com/elfshaker/elfshaker#applicability
In summary, for manyclangs, we compile with -ffunction-sections and -fdata-sections, and store the resulting object files. These are fairly robust to insertions and deletions, since the addresses are section relative, so the damage of any addresses changing is contained within the sections. A somewhat surprising thing is that this works well enough when building many revisions of clang/llvm -- as you go from commit to commit, many commits have bit identical object files, even though the build system often wants to rebuild them because some input has changed.
elfshaker packs use a heuristic of sorting all unique objects by size, before concatenating them and storing them with zstandard. This gives us an amortized cost-per-commit of something like 40kiB after compression with zstandard.

manyclangs

3 134 2.6 Dockerfile

Repository hosting unofficial binary pack files for many commits of LLVM

Author here. elfshaker itself does not have a dependency on any architecture to our knowledge. We support the architectures we have immediate use of.
manyclangs provides binary pack files for aarch64 because that's what we have immediate use of. If elfshaker and manyclangs proves useful to people, I would love to see resource invested to make it more widely useful.
You can still run the manyclangs binaries on other architectures using qemu [0], with some performance cost, which may be tolerable depending on your use case.
[0] https://github.com/elfshaker/manyclangs/tree/main/docker-qem...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nixpkgs

972 15,581 10.0 Nix

Nix Packages collection & NixOS

I experimented with something similar with a Linux distribution's package binary cache.
Using `bup` (deduplicating backup tool using git packfile format) I deduplicated 4 Chromium builds into the size of 1.
Large download/storage requirements for updates are one of NixOS's few drawbacks, and I think deduplication could solve that pretty much completely.
Details: https://github.com/NixOS/nixpkgs/issues/89380

dwarfs

21 1,860 9.9 C++

A fast high compression read-only file system for Linux, Windows and macOS

Somewhat related (and definitely born out of a very similar use case): https://github.com/mhx/dwarfs
I initially built this for having access to 1000+ Perl installations (spanning decades of Perl releases). The compression in this case is not quite as impressive (50 GiB to around 300 MiB), but access times are typically in the millisecond region.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project