Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
While GNU tar inherits this from the original tar, it doesn't seem to be something intrinsic to the PDP-11 or DEC given that the earlier Unix 'tap' archival system uses 'plain binary numbers ("base 256")', not octals.
Here's the V7 tar.c: https://github.com/dspinellis/unix-history-repo/blob/Researc...
and it's documentation: https://github.com/dspinellis/unix-history-repo/blob/Researc...
The C code clearly shows the octal (I've never used "%o" myself!):
sscanf(dblock.dbuf.mode, "%o", &i);
> You may think that the numeric values (file_mode, file_size, file_mtime, ...) would be encoded in base 10, or maybe in hex, or using plain binary numbers ("base 256"). But no, they're actually encoded as octal strings (with a NUL terminator, or sometimes a space terminator). Tar is the only file format I know of which uses base 8 to encode numbers.
Tar’s “competitor”, cpio, does this as well (at least in one of the popular implementations). The Xcode XIP file, if you’re familiar with that particular format, is a couple layers of wrapping on top of a cpio archive, so deep inside my tool to expand these there’s a spot where I read these all out in a similar fashion: https://github.com/saagarjha/unxip/blob/5dcfe2c0f9578bc43070...
More generally, though, both tar and cpio doesn’t have an index because they don’t need them, they’re meant to be decompressed in a streaming fashion where this information isn’t necessary. It’s kind of inconvenient if you want to extract a single file, but for large archives you really want to compress files together and maintaining an index for this is both expensive and nontrivial, so I guess these formats are just “good enough”.
"In any case, the result is that a tar extractor which wants to support both pax tar files and GNU tar files needs to support 5 different ways of reading the file path, 5 different ways of reading the link path, and 3 different ways of reading the file size."
https://github.com/libarchive/libarchive
Author of openat2 here, yes it would (as well as protecting against races where the target directory is having path components changed to symlinks during extraction). RESOLVE_IN_ROOT would be more akin to extracting in a chroot(2).
I've been working on a userspace library[1] which would make writing userspace programs that interact with these kinds of dangerous paths more safe (though it's been on the backburner recently).
[1]: https://github.com/openSUSE/libpathrs