Pigz: Parallel gzip for modern multi-processor, multi-core machines

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • solaris-userland

    Open Source software in Solaris using gmake based build system to drive building various software components.

  • You can grab the version from the solaris userland repo I linked and use it without me completing a homework assignment. Just grab the pigz-2.3.4 source then apply the patches from [1] in the proper order. Maybe some of them aren't needed for non-Solaris.

    1. https://github.com/oracle/solaris-userland/tree/master/compo...

    I thought I had opened a PR for that a long while ago, but it doesn't show up on github these days. In any case, I did ask Mark Adler to review it. It was never a priority, then the code changed in ways that I don't really want to deal with.

    While looking through the PRs, I noticed a PR for Blocked GZip Format (BGZF) [2]. That's very interesting, and perhaps suggests that bgzip is a tool you would be interested in.

    2. https://github.com/madler/pigz/pull/19

  • DirectStorage

    DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.

  • If you are interested in optimizing parallel decompression and you happen to have a suitable NVIDIA GPU, GDeflate [1] is interesting. The target market for this is PC games using DirectStorage to quickly load game assets. The graph in [1] shows DirectStorage maxing out the throughput of a PCIe Gen 3 drive at about 3 GiB/s when compression is not used.

    If you have suitable hardware running Windows, you can try this out for yourself using Microsoft's DirectStorage GPU decompression benchmark [2].

    A reference implementation of a single threaded compressor and multi (CPU) threaded decompressor can be found at [3]. It is Apache-2 licensed.

    1. https://developer.nvidia.com/blog/accelerating-load-times-fo...

    2. https://github.com/microsoft/DirectStorage/tree/main/Samples...

    3. https://github.com/microsoft/DirectStorage/blob/main/GDeflat...

    Disclaimer: I work for NVIDIA, have nothing to do with this, and am not speaking for NVIDIA.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rapidgzip

    Gzip Decompression and Random Access for Modern Multi-Core Machines

  • I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.

  • Moby

    The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

  • Useful with Docker, see https://github.com/moby/moby/pull/35697

    I’ve integrated pigz into different build and CI pipelines a few times. Don’t expect wonders since some steps still need to run serially, but a few seconds here and there might still add up to a few minutes on a large build.

  • zip.js

    JavaScript library to zip and unzip files supporting multi-core compression, compression streams, zip64, split files and encryption.

  • Similarly, if people are interested, I have coded the possibility to compress zip files on several cores in zip.js [1]. The approach is simpler as it consists of compressing the entries in parallel. It still offers a significant performance gain though when compressing multiple files in a zip file, which is often the nominal case.

    [1] https://github.com/gildas-lormeau/zip.js

  • isa-l

    Intelligent Storage Acceleration Library

  • containerd

    An open and reliable container runtime

  • Containerd will utilize unpigz if it’s on your PATH, thank me later: https://github.com/containerd/containerd/blob/main/archive/c...

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • zindex

    libznz with zindex (by zrajna)

  • Interesting. It looks like https://github.com/zrajna/zindex became public about a year after my searches for parallel uncompression came up empty and I started hacking on pigz.

  • TurboBench

    Compression Benchmark

  • Build or download TurboBench [1] executables for linux and windows from releases [2] ans make your own tests comparing oodle,zstd and other compressors.

    [1] https://github.com/powturbo/TurboBench

    [2] https://github.com/powturbo/TurboBench/releases

  • pixz

    Parallel, indexed xz compressor

  • That's really confusing since `pixz` exists and its "pixie" pronunciation actually works

    https://github.com/vasi/pixz

  • pigz

    A parallel implementation of gzip for modern multi-processor, multi-core machines.

  • You can grab the version from the solaris userland repo I linked and use it without me completing a homework assignment. Just grab the pigz-2.3.4 source then apply the patches from [1] in the proper order. Maybe some of them aren't needed for non-Solaris.

    1. https://github.com/oracle/solaris-userland/tree/master/compo...

    I thought I had opened a PR for that a long while ago, but it doesn't show up on github these days. In any case, I did ask Mark Adler to review it. It was never a priority, then the code changed in ways that I don't really want to deal with.

    While looking through the PRs, I noticed a PR for Blocked GZip Format (BGZF) [2]. That's very interesting, and perhaps suggests that bgzip is a tool you would be interested in.

    2. https://github.com/madler/pigz/pull/19

  • nvcomp

    Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts