The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Rapidgzip Alternatives
Similar projects and alternatives to rapidgzip
-
DirectStorage
DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
fast_float
Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12 and WebKit/Safari
-
nvcomp
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
ratarmount
Access large archives as a filesystem efficiently, e.g., TAR, RAR, ZIP, GZ, BZ2, XZ, ZSTD archives
-
zip.js
JavaScript library to zip and unzip files supporting multi-core compression, compression streams, zip64, split files and encryption.
-
solaris-userland
Open Source software in Solaris using gmake based build system to drive building various software components.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
rapidgzip reviews and mentions
- Show HN: Rapidgzip – Parallel Gzip Decompressing with 10 GB/S
-
Ebiggers/libdeflate: Heavily optimized DEFLATE/zlib/gzip library
I also did benchmarks with zlib and libarchivemount via their library interface here [0]. It has been a while that I have run them, so I forgot. Unfortunately, I did not add libdeflate.
[0] https://github.com/mxmlnkn/rapidgzip/blob/master/src/benchma...
-
Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]
Hi, author here.
You are right in the index being the easy-mode. Over the years there have been lots of implementations trying to add an index like that to the gzip metadata itself or as a sidecar file, with bgzip probably being the most known one. None of them really did stick, hence the necessity for some generic multi-threaded decompressor. A probably incomplete list of such implementations can be found in this issue: https://github.com/mxmlnkn/rapidgzip/issues/8
The index makes it so easy that I can simply delegate decompression to zlib. And since paper publication I've actually improved upon this by delegating to ISA-l / igzip instead, which is twice as fast. This is already in the 0.8.0 release.
As derived from table 1, the false positive rate is 1 Tbit / 202 = 5 Gbit or 625 MB for deflate blocks with dynamic Huffman code. For non-compressed blocks, the false positive rate is roughly one per 500 KB, however non-compressed blocks can basically be memcpied or skipped over and then the next deflate header can be checked without much latency. On the other hand, for dynamic blocks, the whole block needs to be decompressed first to find the next one. So the much higher false positive rate for non-compressed blocks doesn't introduce that much overhead.
I have some profiling built into rapidgzip, which is printed with -v, e.g., rapidgzip -v -d -o /dev/null 20xsilesia.tar.gz :
Time spent in block finder : 0.227751 s
- Intel QuickAssist Technology Zstandard Plugin for Zstandard
- Tool and Library for Parallel Gzip Decompression and Random Access
-
Pigz: Parallel gzip for modern multi-processor, multi-core machines
I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.
-
Parquet: More than just “Turbo CSV”
Decompression of arbitrary gzip files can be parallelized with pragzip: https://github.com/mxmlnkn/pragzip
-
The Cost of Exception Handling
At the very least you are duplicating logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
https://github.com/mxmlnkn/pragzip/commit/0b1af498377838c30f...
and here the benchmarks I ran at that time:
https://github.com/mxmlnkn/pragzip/blob/0b1af498377838c30fea...
As you can see, it's part of my random-seekable multi-threaded gzip and bzip2 parallel decompression libraries.
What you can also see in the commit message is that it wasn't a 50% time reduction but a 50% bandwidth increase, which would translate to a 30% time reduction. It seems I remembered that partly wrong. But it still was a significant optimization for me.
- How Much Faster Is Making a Tar Archive Without Gzip?
- Show HN: Thread-Parallel Decompression and Random Access to Gzip Files (Pragzip)
-
A note from our sponsor - WorkOS
workos.com | 27 Apr 2024
Stats
mxmlnkn/rapidgzip is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of rapidgzip is C++.
Sponsored