zlib-ng vs rapidgzip

zlib-ng

zlib replacement with optimizations for "next generation" systems. (by zlib-ng)

Source Code

Suggest alternative

Edit details

rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines (by mxmlnkn)

CLI CPP Cpp17 cpp17-library Decompression Gzip gzip-decompression Library Parallel python-library Python3 random-access Thread header-only Command-line Command Line Tool

Source Code

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

zlib-ng		rapidgzip
	Project
13	Mentions	14
1,445	Stars	317
2.1%	Growth	-
9.3	Activity	9.5
5 days ago	Latest Commit	7 days ago
C	Language	C++
zlib License	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

zlib-ng

Posts with mentions or reviews of zlib-ng. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-24.

Show HN: Pzip- blazing fast concurrent zip archiver and extractor
2 projects | news.ycombinator.com | 24 Sep 2023

Please note that allowing for 2% bigger resulting file could mean huge speedup in these circumstances even with the same compression routines, seeing these benchmarks of zlib and zlib-ng for different compression levels:
https://github.com/zlib-ng/zlib-ng/discussions/871
IMO the fair comparison of the real speed improvement brought by a new program is only between the almost identical resulting compressed sizes.
Intel QuickAssist Technology Zstandard Plugin for Zstandard
10 projects | news.ycombinator.com | 16 Aug 2023
Introducing zune-inflate: The fastest Rust implementation of gzip/Zlib/DEFLATE
2 projects | /r/rust | 11 Feb 2023

It is much faster than miniz_oxide and all other safe-Rust implementations, and consistently beats even Zlib. The performance is roughly on par with zlib-ng - sometimes faster, sometimes slower. It is not (yet) as fast as the original libdeflate in C.
Zlib Critical Vulnerability
4 projects | news.ycombinator.com | 14 Oct 2022

Zlib-ng doesn't contain the same code, but it appears that their equivalent inflate() when used with their inflateGetHeader() implementation was affected by a similar problem: https://github.com/zlib-ng/zlib-ng/pull/1328
Also similarly, most client code will be unaffected because `state->head` will be NULL, because they (most client code) won't have used inflateGetHeader() at all.
Git’s database internals II: commit history queries
3 projects | news.ycombinator.com | 30 Aug 2022

I wonder if zlib-ng would make a difference, since it has a lot of optimizations for modern hardware.
https://github.com/zlib-ng/zlib-ng/discussions/871
Computing Adler32 Checksums at 41 GB/s
5 projects | news.ycombinator.com | 7 Aug 2022

zlib-ng also has adler32 implementations optimized for various architectures: https://github.com/zlib-ng/zlib-ng
Might be interesting to benchmark their implementation too to see how it compares.

2 projects | news.ycombinator.com | 4 Aug 2022
Convenient CPU feature detection and dispatch in the Magnum Engine
9 projects | /r/cpp | 2 Aug 2022

zlib-ng: https://github.com/zlib-ng/zlib-ng/blob/develop/functable.c
games-emulation/dolphin-9999 is failing to build because devs switched to minizip-ng and zlib uses minizip. I'm not sure how to get it to build now, details in post.
2 projects | /r/Gentoo | 20 Jun 2022

(2) There are many packages that rely upon zlib and minizip and switching those underlying dependencies is easier said than done. We can't drop zlib completely and switch: "The idea of zlib-ng is not to replace zlib, but to co-exist as a drop-in replacement with a lower threshold for code change." - https://github.com/zlib-ng/zlib-ng
Re: Zlib memory corruption on deflate (i.e. compress)
4 projects | news.ycombinator.com | 28 Mar 2022

There are already active zlib forks (e.g. https://github.com/zlib-ng/zlib-ng), the problem is with having people move to them. It takes a lot of effort to move mindshare from the original version to a fork, there's some historical examples of it happening, but not a ton.

rapidgzip

Posts with mentions or reviews of rapidgzip. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-04.

Show HN: Rapidgzip – Parallel Gzip Decompressing with 10 GB/S
3 projects | news.ycombinator.com | 4 Sep 2023
Ebiggers/libdeflate: Heavily optimized DEFLATE/zlib/gzip library
5 projects | news.ycombinator.com | 26 Aug 2023

I also did benchmarks with zlib and libarchivemount via their library interface here [0]. It has been a while that I have run them, so I forgot. Unfortunately, I did not add libdeflate.
[0] https://github.com/mxmlnkn/rapidgzip/blob/master/src/benchma...
Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]
3 projects | news.ycombinator.com | 21 Aug 2023
Hi, author here.
You are right in the index being the easy-mode. Over the years there have been lots of implementations trying to add an index like that to the gzip metadata itself or as a sidecar file, with bgzip probably being the most known one. None of them really did stick, hence the necessity for some generic multi-threaded decompressor. A probably incomplete list of such implementations can be found in this issue: https://github.com/mxmlnkn/rapidgzip/issues/8
The index makes it so easy that I can simply delegate decompression to zlib. And since paper publication I've actually improved upon this by delegating to ISA-l / igzip instead, which is twice as fast. This is already in the 0.8.0 release.
As derived from table 1, the false positive rate is 1 Tbit / 202 = 5 Gbit or 625 MB for deflate blocks with dynamic Huffman code. For non-compressed blocks, the false positive rate is roughly one per 500 KB, however non-compressed blocks can basically be memcpied or skipped over and then the next deflate header can be checked without much latency. On the other hand, for dynamic blocks, the whole block needs to be decompressed first to find the next one. So the much higher false positive rate for non-compressed blocks doesn't introduce that much overhead.
I have some profiling built into rapidgzip, which is printed with -v, e.g., rapidgzip -v -d -o /dev/null 20xsilesia.tar.gz :
```
    Time spent in block finder              : 0.227751 s
```
Intel QuickAssist Technology Zstandard Plugin for Zstandard
10 projects | news.ycombinator.com | 16 Aug 2023
Tool and Library for Parallel Gzip Decompression and Random Access
1 project | news.ycombinator.com | 12 May 2023
Pigz: Parallel gzip for modern multi-processor, multi-core machines
15 projects | news.ycombinator.com | 12 May 2023

I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.
Parquet: More than just “Turbo CSV”
7 projects | news.ycombinator.com | 3 Apr 2023

Decompression of arbitrary gzip files can be parallelized with pragzip: https://github.com/mxmlnkn/pragzip
The Cost of Exception Handling
1 project | news.ycombinator.com | 13 Nov 2022

At the very least you are duplicating logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
https://github.com/mxmlnkn/pragzip/commit/0b1af498377838c30f...
and here the benchmarks I ran at that time:
https://github.com/mxmlnkn/pragzip/blob/0b1af498377838c30fea...
As you can see, it's part of my random-seekable multi-threaded gzip and bzip2 parallel decompression libraries.
What you can also see in the commit message is that it wasn't a 50% time reduction but a 50% bandwidth increase, which would translate to a 30% time reduction. It seems I remembered that partly wrong. But it still was a significant optimization for me.
How Much Faster Is Making a Tar Archive Without Gzip?
8 projects | news.ycombinator.com | 10 Oct 2022
Show HN: Thread-Parallel Decompression and Random Access to Gzip Files (Pragzip)
1 project | news.ycombinator.com | 6 Aug 2022

What are some alternatives?

When comparing zlib-ng and rapidgzip you can also consider the following projects:

zstd - Zstandard - Fast real-time compression algorithm

QATzip - Compression Library accelerated by Intel® QuickAssist Technology

ZLib - A massively spiffy yet delicately unobtrusive compression library.

pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.

Minizip-ng - Fork of the popular zip manipulation library found in the zlib distribution.

nvcomp - Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

libdeflate - Heavily optimized library for DEFLATE/zlib/gzip compression and decompression

parquet-format - Apache Parquet

brotli - Brotli compression format

DirectStorage - DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.

uzlib - Radically unbloated DEFLATE/zlib/gzip compression/decompression library. Can decompress any gzip/zlib data, and offers simplified compressor which produces gzip-compatible output, while requiring much less resources (and providing less compression ratio of course).

pixz - Parallel, indexed xz compressor

zlib-ng vs zstd rapidgzip vs QATzip zlib-ng vs ZLib rapidgzip vs pigz zlib-ng vs Minizip-ng rapidgzip vs nvcomp zlib-ng vs libdeflate rapidgzip vs parquet-format zlib-ng vs brotli rapidgzip vs DirectStorage zlib-ng vs uzlib rapidgzip vs pixz

Compare zlib-ng vs rapidgzip and see what are their differences.

zlib-ng

rapidgzip

zlib-ng

rapidgzip

What are some alternatives?