How Gzip Work

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tiralabra

    gzip decompressor in ~300 lines of readable python

    If you prefer reading Python, I implemented the decompressor not too long ago: https://github.com/LaihoE/tiralabra

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • pigz

    A parallel implementation of gzip for modern multi-processor, multi-core machines.

    Parallel compression (pigz [0]) and decompression (rapidgzip [1]), for one. When you're dealing with multi-TB files, this is a big deal.

    [0]: https://github.com/madler/pigz

    [1]: https://github.com/mxmlnkn/rapidgzip

  • rapidgzip

    Gzip Decompression and Random Access for Modern Multi-Core Machines

    Parallel compression (pigz [0]) and decompression (rapidgzip [1]), for one. When you're dealing with multi-TB files, this is a big deal.

    [0]: https://github.com/madler/pigz

    [1]: https://github.com/mxmlnkn/rapidgzip

  • deflate-frolicking

    Analyse and modify DEFLATE streams

    Even if it doesn't use block-based compression, if there isn't a huge range of corrupted bytes, corruption offsets are usually identifiable, as you will quickly end up with invalid length-distance pairs and similar errors. Although, errors might be reported a few bytes after the actual corruption.

    I was motivated some years ago to try recovering from these errors [1] when I was handling a DEFLATE compressed JSON file, where there seemed to be a single corrupted byte every dozen or so bytes in the stream. It looked like something you could recover from. If you output decompressed bytes as the stream was parsed, you could clearly see a prefix of the original JSON being recovered up to the first corruption.

    In that case the decompressed payload was plaintext, but even with a binary format, something like kaitai-struct might give you an invalid offset to work from.

    For these localized corruptions, it's possible to just bruteforce one or two bytes along this range, and reliably fix the DEFLATE stream. Not really doable once we are talking about a sequence of four or more corrupted bytes.

    [1]: https://github.com/nevesnunes/deflate-frolicking

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]

    3 projects | news.ycombinator.com | 21 Aug 2023
  • uni-algo v1.0.0: Modern Unicode Library

    6 projects | /r/cpp | 7 Jul 2023
  • Can I include cycfi/elements with CMake in any project or must I build up on example projects?

    1 project | /r/cpp_questions | 8 Jan 2023
  • Can I include cycfi/elements with CMake in any project or must I build up on example projects?

    1 project | /r/cpp | 8 Jan 2023
  • C++ Show and Tell - December 2022

    8 projects | /r/cpp | 2 Dec 2022

Did you konow that Python is
the 1st most popular programming language
based on number of metions?