fast_float vs rapidgzip

fast_float

Fast and exact implementation of the C++ from_chars functions for number types: 4x to 10x faster than strtod, part of GCC 12 and WebKit/Safari (by fastfloat)

Source Code

Suggest alternative

Edit details

rapidgzip

Gzip Decompression and Random Access for Modern Multi-Core Machines (by mxmlnkn)

CLI CPP Cpp17 cpp17-library Decompression Gzip gzip-decompression Library Parallel python-library Python3 random-access Thread header-only Command-line Command Line Tool

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

fast_float		rapidgzip
	Project
15	Mentions	14
1,277	Stars	317
1.5%	Growth	-
8.7	Activity	9.5
about 1 month ago	Latest Commit	12 days ago
C++	Language	C++
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

fast_float

Posts with mentions or reviews of fast_float. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-03.

Parquet: More than just “Turbo CSV”
7 projects | news.ycombinator.com | 3 Apr 2023

> Google put in significant engineering effort into "Ryu", a parsing library for double-precision floating point numbers: https://github.com/ulfjack/ryu
It's not a parsing library, but a printing one, i.e., double -> string. https://github.com/fastfloat/fast_float is a parsing library, i.e., string -> double, not by Google though, but was indeed motivated by parsing JSON fast https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...
What do number conversions (from string) cost?
1 project | /r/cpp | 20 Mar 2023

For those that don't know, gcc 12.x updated its float parsing logic to something similar to fast_float and it's about 1/6 of the cost presented here (sub 100 in the graph presented here). Strongly suggest using that library or upgrading the compiler if you need the performance.
Can sanitizers find the two bugs I wrote in C++?
11 projects | news.ycombinator.com | 8 Feb 2023

This makes sense for integers but betware floating point from_chars - libc++ still doesn't implement it and libstdc++ implements it by wrapping locale-dependent libc functions which involves temporarily changing the thread locale and possibly memory allocation to make the passed string 0-terminated. IMO libstdc++'s checkbox "solution" is worse than not implementing it at all - user's are better off using Lemire's API-compatible fast_float implementation [0].
[0] https://github.com/fastfloat/fast_float
Passing Programs To A Stack Machine
1 project | /r/cpp_questions | 11 Nov 2021

I'm a bit stuck on how to do the same thing in c++, due to containers only having a single type. The very inefficient way I'm currently doing it is by passing a program as a vector of strings, and then converting the string constants to doubles with the fast_float library.
Parsing can become accidentally quadratic because of sscanf
2 projects | /r/programming | 3 Oct 2021

Just above this comment is a merged PR, which references fast_float library: https://github.com/fastfloat/fast_float
Making Rust Float Parsing Fast: libcore Edition
10 projects | /r/rust | 17 Jul 2021

Daniel Lemire @lemire (creator of the algorithm, author of the C++ implementation, and provided constant feedback to help guide the PR).
RapidObj v0.1 - A fast, header-only, C++17 library for parsing Wavefront .obj files.
4 projects | /r/cpp | 28 Jun 2021

And out of 6,000 lines in the file, at least 3000 are other people's code: earcut for polygon triangulation and fast_float because .obj files typically contain a lot of floating point numbers so it's important to parse them quickly.
First release of dragonbox, a fast float-to-string conversion algorithm, is available
3 projects | /r/cpp | 22 May 2021

How this compares to https://github.com/fastfloat/fast_float ?
Why is std::from_chars<float> slow?
1 project | /r/cpp | 11 May 2021

I tried to compare it against Daniel Lemire's excellent fast_float library. Fast float took about 180ms for the same program, and all I did was change "std" namespace prefix to "fast_float". It's a factor of 12 difference, at least my machine. I tried MSVC next, and it is a lot better, but it is still ~4 times slower than fast float. AFAIK, clang currently does not implement the feature at all.
Iterator invalidation of std::string_view
1 project | /r/cpp | 12 Feb 2021

If you don't mind a 3rd party lib until your stdlib updates, https://github.com/fastfloat/fast_float is best-in-class.

rapidgzip

Posts with mentions or reviews of rapidgzip. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-04.

Show HN: Rapidgzip – Parallel Gzip Decompressing with 10 GB/S
3 projects | news.ycombinator.com | 4 Sep 2023
Ebiggers/libdeflate: Heavily optimized DEFLATE/zlib/gzip library
5 projects | news.ycombinator.com | 26 Aug 2023

I also did benchmarks with zlib and libarchivemount via their library interface here [0]. It has been a while that I have run them, so I forgot. Unfortunately, I did not add libdeflate.
[0] https://github.com/mxmlnkn/rapidgzip/blob/master/src/benchma...
Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]
3 projects | news.ycombinator.com | 21 Aug 2023
Hi, author here.
You are right in the index being the easy-mode. Over the years there have been lots of implementations trying to add an index like that to the gzip metadata itself or as a sidecar file, with bgzip probably being the most known one. None of them really did stick, hence the necessity for some generic multi-threaded decompressor. A probably incomplete list of such implementations can be found in this issue: https://github.com/mxmlnkn/rapidgzip/issues/8
The index makes it so easy that I can simply delegate decompression to zlib. And since paper publication I've actually improved upon this by delegating to ISA-l / igzip instead, which is twice as fast. This is already in the 0.8.0 release.
As derived from table 1, the false positive rate is 1 Tbit / 202 = 5 Gbit or 625 MB for deflate blocks with dynamic Huffman code. For non-compressed blocks, the false positive rate is roughly one per 500 KB, however non-compressed blocks can basically be memcpied or skipped over and then the next deflate header can be checked without much latency. On the other hand, for dynamic blocks, the whole block needs to be decompressed first to find the next one. So the much higher false positive rate for non-compressed blocks doesn't introduce that much overhead.
I have some profiling built into rapidgzip, which is printed with -v, e.g., rapidgzip -v -d -o /dev/null 20xsilesia.tar.gz :
```
    Time spent in block finder              : 0.227751 s
```
Intel QuickAssist Technology Zstandard Plugin for Zstandard
10 projects | news.ycombinator.com | 16 Aug 2023
Tool and Library for Parallel Gzip Decompression and Random Access
1 project | news.ycombinator.com | 12 May 2023
Pigz: Parallel gzip for modern multi-processor, multi-core machines
15 projects | news.ycombinator.com | 12 May 2023

I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.
Parquet: More than just “Turbo CSV”
7 projects | news.ycombinator.com | 3 Apr 2023

Decompression of arbitrary gzip files can be parallelized with pragzip: https://github.com/mxmlnkn/pragzip
The Cost of Exception Handling
1 project | news.ycombinator.com | 13 Nov 2022

At the very least you are duplicating logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
https://github.com/mxmlnkn/pragzip/commit/0b1af498377838c30f...
and here the benchmarks I ran at that time:
https://github.com/mxmlnkn/pragzip/blob/0b1af498377838c30fea...
As you can see, it's part of my random-seekable multi-threaded gzip and bzip2 parallel decompression libraries.
What you can also see in the commit message is that it wasn't a 50% time reduction but a 50% bandwidth increase, which would translate to a 30% time reduction. It seems I remembered that partly wrong. But it still was a significant optimization for me.
How Much Faster Is Making a Tar Archive Without Gzip?
8 projects | news.ycombinator.com | 10 Oct 2022
Show HN: Thread-Parallel Decompression and Random Access to Gzip Files (Pragzip)
1 project | news.ycombinator.com | 6 Aug 2022

What are some alternatives?

When comparing fast_float and rapidgzip you can also consider the following projects:

dragonbox - Reference implementation of Dragonbox in C++

pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.

rapidobj - A fast, header-only, C++17 library for parsing Wavefront .obj files.

DirectStorage - DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.

C++ Format - A modern formatting library

QATzip - Compression Library accelerated by Intel® QuickAssist Technology

fast-float-rust - Super-fast float parser in Rust (now part of Rust core)

parquet-format - Apache Parquet

RapidJSON - A fast JSON parser/generator for C++ with both SAX/DOM style API

nvcomp - Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

simdutf8 - SIMD-accelerated UTF-8 validation for Rust.

pixz - Parallel, indexed xz compressor

fast_float vs dragonbox rapidgzip vs pigz fast_float vs rapidobj rapidgzip vs DirectStorage fast_float vs C++ Format rapidgzip vs QATzip fast_float vs fast-float-rust rapidgzip vs parquet-format fast_float vs RapidJSON rapidgzip vs nvcomp fast_float vs simdutf8 rapidgzip vs pixz

Compare fast_float vs rapidgzip and see what are their differences.

fast_float

rapidgzip

fast_float

rapidgzip

What are some alternatives?