Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
We've just added an applicability section, which explains a bit more what we do. We don't have any ELF specific heuristics.
https://github.com/elfshaker/elfshaker#applicability
In summary, for manyclangs, we compile with -ffunction-sections and -fdata-sections, and store the resulting object files. These are fairly robust to insertions and deletions, since the addresses are section relative, so the damage of any addresses changing is contained within the sections. A somewhat surprising thing is that this works well enough when building many revisions of clang/llvm -- as you go from commit to commit, many commits have bit identical object files, even though the build system often wants to rebuild them because some input has changed.
elfshaker packs use a heuristic of sorting all unique objects by size, before concatenating them and storing them with zstandard. This gives us an amortized cost-per-commit of something like 40kiB after compression with zstandard.
Author here. elfshaker itself does not have a dependency on any architecture to our knowledge. We support the architectures we have immediate use of.
manyclangs provides binary pack files for aarch64 because that's what we have immediate use of. If elfshaker and manyclangs proves useful to people, I would love to see resource invested to make it more widely useful.
You can still run the manyclangs binaries on other architectures using qemu [0], with some performance cost, which may be tolerable depending on your use case.
[0] https://github.com/elfshaker/manyclangs/tree/main/docker-qem...
I experimented with something similar with a Linux distribution's package binary cache.
Using `bup` (deduplicating backup tool using git packfile format) I deduplicated 4 Chromium builds into the size of 1.
Large download/storage requirements for updates are one of NixOS's few drawbacks, and I think deduplication could solve that pretty much completely.
Details: https://github.com/NixOS/nixpkgs/issues/89380
Somewhat related (and definitely born out of a very similar use case): https://github.com/mhx/dwarfs
I initially built this for having access to 1000+ Perl installations (spanning decades of Perl releases). The compression in this case is not quite as impressive (50 GiB to around 300 MiB), but access times are typically in the millisecond region.
Related posts
- elfshaker: a low-footprint, high-performance version control system fine-tuned for binaries
- Eelco Dolstra's leadership is corrosive to the Nix project
- NixOS/nixpkgs: There isn't a clear canonical way to refer to a specific package
- NixOS Is Not Reproducible
- From xz to ibus: more questionable tarballs