Tar is an ill-specified format

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • squashfs-tools-ng

    A new set of tools and libraries for working with SquashFS images

  • I once foolishly thought, I'll write a tar parser because, "how hard can it be" [1].

    I simply tried to follow the tar(5) man page[2], and got a reference test set from another website posted previously on HN[3].

    Along the way I discovered that NetBSD pax apparently cannot handle the PAX format[3] and my parser inadvertently uncovered that git-archive was doing the checksums wrong, but nobody noticed because other tar parsers were more lax about it[4].

    As the article describes (as does the man page), tar is actually a really simple format, but there are just so many variants to choose from.

    Turns out, if you strive for maximum compatibility, it's easiest to stick to what GNU tar does. If you think about it, IMO in many ways the GNU project ended up doing "embrace, extend, extinguish" with Unix.

    [1] https://github.com/AgentD/squashfs-tools-ng/tree/master/lib/...

    [2] https://www.freebsd.org/cgi/man.cgi?query=tar&sektion=5

    [3] https://mgorny.pl/articles/portability-of-tar-features.html

    [4] https://www.spinics.net/lists/git/msg363049.html

  • genext2fs

    genext2fs - ext2 filesystem generator for embedded systems

  • I also have written my own tar parser, in https://github.com/bestouff/genext2fs - oh boy it's not easy ! My parser is of course incomplete and won't handle corner cases, which are really plenty. In the end I added libarchive as an alternative to my hand-rolled code because that's how it works best.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dracut

    dracut the event driven initramfs infrastructure

  • (source: https://unix.stackexchange.com/a/266090; also mentioned there is a skipcpio tool: https://github.com/dracutdevs/dracut/blob/master/skipcpio/sk...)

    This hack would also work for extracting concatenating tarballs (without GNU tar's --ignore-zeros option). One annoyance is the warning about the end of the archive.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts