rapidgzip vs ryu

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

rapidgzip		ryu
	Project
14	Mentions	12
324	Stars	1,162
-	Growth	-
9.5	Activity	5.9
11 days ago	Latest Commit	3 months ago
C++	Language	C++
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rapidgzip

Posts with mentions or reviews of rapidgzip. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-04.

Show HN: Rapidgzip – Parallel Gzip Decompressing with 10 GB/S
3 projects | news.ycombinator.com | 4 Sep 2023
Ebiggers/libdeflate: Heavily optimized DEFLATE/zlib/gzip library
5 projects | news.ycombinator.com | 26 Aug 2023

I also did benchmarks with zlib and libarchivemount via their library interface here [0]. It has been a while that I have run them, so I forgot. Unfortunately, I did not add libdeflate.
[0] https://github.com/mxmlnkn/rapidgzip/blob/master/src/benchma...
Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]
3 projects | news.ycombinator.com | 21 Aug 2023
Hi, author here.
You are right in the index being the easy-mode. Over the years there have been lots of implementations trying to add an index like that to the gzip metadata itself or as a sidecar file, with bgzip probably being the most known one. None of them really did stick, hence the necessity for some generic multi-threaded decompressor. A probably incomplete list of such implementations can be found in this issue: https://github.com/mxmlnkn/rapidgzip/issues/8
The index makes it so easy that I can simply delegate decompression to zlib. And since paper publication I've actually improved upon this by delegating to ISA-l / igzip instead, which is twice as fast. This is already in the 0.8.0 release.
As derived from table 1, the false positive rate is 1 Tbit / 202 = 5 Gbit or 625 MB for deflate blocks with dynamic Huffman code. For non-compressed blocks, the false positive rate is roughly one per 500 KB, however non-compressed blocks can basically be memcpied or skipped over and then the next deflate header can be checked without much latency. On the other hand, for dynamic blocks, the whole block needs to be decompressed first to find the next one. So the much higher false positive rate for non-compressed blocks doesn't introduce that much overhead.
I have some profiling built into rapidgzip, which is printed with -v, e.g., rapidgzip -v -d -o /dev/null 20xsilesia.tar.gz :
```
    Time spent in block finder              : 0.227751 s
```
Intel QuickAssist Technology Zstandard Plugin for Zstandard
10 projects | news.ycombinator.com | 16 Aug 2023
Tool and Library for Parallel Gzip Decompression and Random Access
1 project | news.ycombinator.com | 12 May 2023
Pigz: Parallel gzip for modern multi-processor, multi-core machines
15 projects | news.ycombinator.com | 12 May 2023

I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.
Parquet: More than just “Turbo CSV”
7 projects | news.ycombinator.com | 3 Apr 2023

Decompression of arbitrary gzip files can be parallelized with pragzip: https://github.com/mxmlnkn/pragzip
The Cost of Exception Handling
1 project | news.ycombinator.com | 13 Nov 2022

At the very least you are duplicating logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
https://github.com/mxmlnkn/pragzip/commit/0b1af498377838c30f...
and here the benchmarks I ran at that time:
https://github.com/mxmlnkn/pragzip/blob/0b1af498377838c30fea...
As you can see, it's part of my random-seekable multi-threaded gzip and bzip2 parallel decompression libraries.
What you can also see in the commit message is that it wasn't a 50% time reduction but a 50% bandwidth increase, which would translate to a 30% time reduction. It seems I remembered that partly wrong. But it still was a significant optimization for me.
How Much Faster Is Making a Tar Archive Without Gzip?
8 projects | news.ycombinator.com | 10 Oct 2022
Show HN: Thread-Parallel Decompression and Random Access to Gzip Files (Pragzip)
1 project | news.ycombinator.com | 6 Aug 2022

ryu

Posts with mentions or reviews of ryu. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-03.

Printing double aka the most difficult problem in computer sciences
1 project | /r/cpp | 5 Jun 2023

Nah. This is about ryu printf.
Parquet: More than just “Turbo CSV”
7 projects | news.ycombinator.com | 3 Apr 2023

> Google put in significant engineering effort into "Ryu", a parsing library for double-precision floating point numbers: https://github.com/ulfjack/ryu
It's not a parsing library, but a printing one, i.e., double -> string. https://github.com/fastfloat/fast_float is a parsing library, i.e., string -> double, not by Google though, but was indeed motivated by parsing JSON fast https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...
Faster way to convert double to string, not using "%f"?
2 projects | /r/C_Programming | 6 Oct 2022
After obtaning a CS degree and 16 years of experience in industry, I feel somewhat confident that I can answer your programming questions correctly. Ask me anything
1 project | /r/ProgrammerHumor | 12 Aug 2022

Me and Ryu agree that the answer should be 0.30000000000000004
23 years into my career, I still love PHP and JavaScript
6 projects | news.ycombinator.com | 2 Aug 2022

Apparently exact minimal float-to-string conversion is more recent than I thought, and many languages used to print more (Python?) or less (PHP) decimal digits than necessary to uniquely identify the bit pattern. Python correctly prints 46000.80 + 553.04 as 46553.840000000004, but I don't know if it ever prints more digits than needed. One recent algorithm for printing floats exactly is https://github.com/ulfjack/ryu, though I'm unaware what's the state-of-the-art (https://github.com/jk-jeon/dragonbox claims to be a benchmark and the best algorithm).
What's the most elegant algo in your subjective view and why?
1 project | /r/computerscience | 15 Jul 2022

On the huge speedup side, you have the Ryū algorithm for decimal conversion (Video, Source), which is now finding its way in most standard libraries. But it isn't a hack, and a very dense, complex and precise algo, nothing like the fast-and-loose inverse square root.
C++ devs at FAANG companies, what kind of work do you do?
2 projects | /r/cpp | 10 Jul 2022

Used a wizard's magic to print "3.14" faster
how to make ftoa procedure from scratch
2 projects | /r/asm | 13 Jun 2022

Here's a paper that details an optimized algorithm (reference implementation). It also contains a description of a correct, but slow algorithm as well as references to classic papers on the subject. Earlier the classic implementation was the dtoa one included in netlib by David Gay.
Dragonbox 1.1.0 is released (a fast float-to-string conversion algorithm)
4 projects | /r/cpp | 8 Feb 2022

At the very core of all these theoretical stuffs, there is the theory of continued fractions. This is an immensely useful monster which I even dare call as the ultimate tool for floating-point formatting/parsing that everyone who wants to contribute in this field should learn. Before I learned continued fractions, my main tool for proving stuffs was the minmax Euclid algorithm (which is one of the greatest contributions of the wonderful Ryu paper), but it turns out that it is actually just a quite straightforward application of the theory of continued fractions. The main role minmax Euclid algorithm played was to estimate the maximum size of possible errors, but with continued fractions it is even possible to find the list of all examples that generate errors above a given threshold. This is something I desperately wanted but really couldn't do back in 2020.
FastDoubleParser: Java port of Daniel Lemires fast_double_parser
4 projects | news.ycombinator.com | 22 Mar 2021

Ryū algorithm, the converse (doubles to strings), is also much faster than using Java's number formatting classes.
https://github.com/ulfjack/ryu/blob/master/src/main/java/inf...

What are some alternatives?

When comparing rapidgzip and ryu you can also consider the following projects:

pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.

dragonbox - Reference implementation of Dragonbox in C++

DirectStorage - DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.

C++ Format - A modern formatting library

QATzip - Compression Library accelerated by Intel® QuickAssist Technology

concise-encoding - The secure data format for a modern world

parquet-format - Apache Parquet

proust - Compiling implementation of mustache

nvcomp - Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

itoa - Fast integer to ascii / integer to string conversion

pixz - Parallel, indexed xz compressor

oss-fuzz - OSS-Fuzz - continuous fuzzing for open source software.

rapidgzip vs pigz ryu vs dragonbox rapidgzip vs DirectStorage ryu vs C++ Format rapidgzip vs QATzip ryu vs concise-encoding rapidgzip vs parquet-format ryu vs proust rapidgzip vs nvcomp ryu vs itoa rapidgzip vs pixz ryu vs oss-fuzz

Compare rapidgzip vs ryu and see what are their differences.

rapidgzip

ryu

rapidgzip

ryu

What are some alternatives?