Xz format considered inadequate for long-term archiving

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Scout APM - Truly a developer’s best friend
  • talent.io - Download talent.io’s Tech Salary Report
  • SonarQube - Static code analysis for 29 languages.
  • InfluxDB - Build time-series-based applications quickly and at scale.
  • dwarfs

    A fast high compression read-only file system

    https://github.com/mhx/dwarfs

    "DwarFS compression is an order of magnitude better than SquashFS compression, it's 6 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources."

  • pixz

    Parallel, indexed xz compressor

    pixz (https://github.com/vasi/pixz) is a nice parallel xz that additionally creates an index of tar files so you can decompress individual files. I wonder if dpkg could be extended to do something similar.

  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

  • zstd

    Zstandard - Fast real-time compression algorithm

    zstd still doesn't have a seekable format as part of the official standard (I wish it did): https://github.com/facebook/zstd/issues/395#issuecomment-535...

  • zpaqlpy

    Compiles a zpaqlpy source file (a Python-subset) to a ZPAQ configuration file for usage with zpaqd

    ZPAQ is the name of the tool but ZPAQ is also the name of the container format that gets used. ZPAQ embeds the decompression algorithm in the archive. One could store zstd-compressed blocks in ZPAQ archives as soon as a zpaql decompressor exists (e.g., for brotli there is a slow one implemented in a python subset and compiled to zpaql https://github.com/pothos/zpaqlpy).

    I don't know exactly whether other formats are better for seeking and streaming, but since the baseline is tar, ZPAQ (in the 2.0 spec) is already better as it supports deduplication and files can even be updated append-only, and the compression is not an afterthought wrapped around it but well integrated.

  • zfec

    zfec -- an efficient, portable erasure coding tool

    I disagree with the premise of the article. Archive formats are all inadequate for long-term resilience and making them adequate would be a violation of the “do one thing and do it right” principle.

    To support resilience, you don’t need an alternative to xz, you need hashes and forward error correction. Specifically, compress your file using xz for high compression ratio, optionally encrypt it, then take a SHA-256 hash to be used for detecting errors, then generate parity files using PAR[1] or zfec[2] to correct errors.

    [1] https://wiki.archlinux.org/title/Parchive

    [2] https://github.com/tahoe-lafs/zfec

  • BorgBackup

    Deduplicating archiver with compression and authenticated encryption.

    Both Borg [0] and Restic [1] have long standing open issues for error-correction, but seem to consider it off strategy. I find that decision kind of strange, since to me the whole purpose of a backup solution is to restore your system to a correct state after any kind of incident.

    My current solution is an assembly of shell scripts that combine borg with par2, but I'm rather unhappy with it. For one, I trust my home-brewn solution rather faintly (i.e. similar to `don't roll your own crypto` I think there should be an adagium `don't roll your own back-up solutions`). In addition I think an error-correcting mechanism should be available also for the less technology-savvy.

    [0]: https://github.com/borgbackup/borg/issues/225

    [1]: https://github.com/restic/restic/issues/256

  • restic

    Fast, secure, efficient backup program

    Both Borg [0] and Restic [1] have long standing open issues for error-correction, but seem to consider it off strategy. I find that decision kind of strange, since to me the whole purpose of a backup solution is to restore your system to a correct state after any kind of incident.

    My current solution is an assembly of shell scripts that combine borg with par2, but I'm rather unhappy with it. For one, I trust my home-brewn solution rather faintly (i.e. similar to `don't roll your own crypto` I think there should be an adagium `don't roll your own back-up solutions`). In addition I think an error-correcting mechanism should be available also for the less technology-savvy.

    [0]: https://github.com/borgbackup/borg/issues/225

    [1]: https://github.com/restic/restic/issues/256

  • talent.io

    Download talent.io’s Tech Salary Report. Median salaries, most in-demand technologies, state of the remote work... all you need to know your worth on the market by tech recruitment platform talent.io

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts