dietgpu vs nvcomp

dietgpu

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications. (by facebookresearch)

Suggest topics

Source Code

Suggest alternative

Edit details

nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp. (by NVIDIA)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

dietgpu		nvcomp
	Project
4	Mentions	7
294	Stars	528
3.4%	Growth	1.7%
4.3	Activity	4.8
22 days ago	Latest Commit	7 months ago
Cuda	Language	C++
MIT License	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

dietgpu

Posts with mentions or reviews of dietgpu. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-07-30.

The Simple Beauty of XOR Floating Point Compression
1 project | news.ycombinator.com | 11 Apr 2024

https://computing.llnl.gov/projects/floating-point-compressi...
but it tends to be very application specific, where there tends to be high correlation / small deltas between neighboring values in a 2d/3d/4d/etc floating point array (e.g., you are compressing neighboring temperature grid points in a PDE weather simulation model; temperature differences in neighboring cells won't differ by that much).
In a lot of other cases (e.g., machine learning) the floating point significand bits (and sometimes the sign bit) tends to be incompressible noise. The exponent is the only thing that is really compressible, and the xor trick does not help you as much because neighboring values could still vary a bit in terms of exponents. An entropy encoder instead works well for that (encode closer to the actual underlying data distribution/entropy), and you also don't depend upon neighboring floats having similar exponents as well.
In 2022, I created dietgpu, a library to losslessly compress/decompress floating point data at up to 400 GB/s on an A100. It uses a general-purpose asymmetric numeral system encoder/decoder on GPU (the first such implementation of general ANS on GPU, predating nvCOMP) for exponent compression.
We have used this to losslessly compress floating point data between GPUs (e.g., over Infiniband/NVLink/ethernet/etc) in training massive ML models to speed up overall wall clock time of training across 100s/1000s of GPUs without changing anything about how the training works (it's lossless compression, it computes the same thing that it did before).
https://github.com/facebookresearch/dietgpu
Parallelising Huffman decoding and x86 disassembly by synchronising prefix codes
2 projects | news.ycombinator.com | 30 Jul 2022

ANS is super fast and trivially parallizable, faster than Huffman or especially arithmetic encoding. It is fast because it can be machine word oriented (you can read/write whole machine word sizes at a time, not arbitrary/variable bit length sequences), and as a result you can interleave any number of independent (parallel) encoders in the same stream with just a prefix sum to figure out where to write the state normalization values. I for one got up to 400 GB/s throughput on A100 GPUs in my implementation (https://github.com/facebookresearch/dietgpu).
ANS can also self-synchronize as well.
How to defend when patent office gave smb monopoly for your work (e.g. found on github)? Defend JPEG XL from granted ANS patent? (author here)
2 projects | /r/StallmanWasRight | 8 Mar 2022

In case of rANS patent, beside individual donations, also organizations blocked by given patent could donate - e.g. JPEG, Google, Nvidia, Facebook here.
DietGPU: Fast ANS Codec for Nvidia GPUs
1 project | news.ycombinator.com | 31 Jan 2022

nvcomp

Posts with mentions or reviews of nvcomp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-12.

Pigz: Parallel gzip for modern multi-processor, multi-core machines
15 projects | news.ycombinator.com | 12 May 2023
GDeflate: An Open GPU Compression Standard
1 project | /r/hardware | 30 Dec 2022
NVIDIA talks up RTX IO with GDeflate (used in DirectStorage 1.1) to speed up games
1 project | /r/linux_gaming | 6 Dec 2022
Nvidia's nvCOMP now supports Zstd compression and decompression in GPU
1 project | news.ycombinator.com | 24 Oct 2022
DirectStorage 1.1 Coming Soon - GPU Decompression
1 project | /r/hardware | 13 Oct 2022

NVIDIA nvcomp had GDeflate significantly earlier on the GPGPU side: https://github.com/NVIDIA/nvcomp. nvcomp releases before 2.3 were open-source.
How to defend when patent office gave smb monopoly for your work (e.g. found on github)? Defend JPEG XL from granted ANS patent? (author here)
2 projects | /r/StallmanWasRight | 8 Mar 2022

In case of rANS patent, beside individual donations, also organizations blocked by given patent could donate - e.g. JPEG, Google, Nvidia, Facebook here.
Community Event: "nvCOMP: a CUDA library for Fast Lossless Compression and Decompression on GPUs"
1 project | /r/CUDA | 17 Sep 2021

Your link to https://github.com/NVIDIA/nvcomp, is broken because of the comma at the end.

What are some alternatives?

When comparing dietgpu and nvcomp you can also consider the following projects:

gpuhd - Massively Parallel Huffman Decoding on GPUs

rapidgzip - Gzip Decompression and Random Access for Modern Multi-Core Machines

DirectStorage - DirectStorage for Windows is an API that allows game developers to unlock the full potential of high speed NVMe drives for loading game assets.

TurboBench - Compression Benchmark

pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.

isa-l - Intelligent Storage Acceleration Library

solaris-userland - Open Source software in Solaris using gmake based build system to drive building various software components.

zindex - libznz with zindex

containerd - An open and reliable container runtime

Moby - The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

pixz - Parallel, indexed xz compressor

zip.js - JavaScript library to zip and unzip files supporting multi-core compression, compression streams, zip64, split files and encryption.

dietgpu vs gpuhd nvcomp vs rapidgzip nvcomp vs DirectStorage nvcomp vs TurboBench nvcomp vs pigz nvcomp vs isa-l nvcomp vs solaris-userland nvcomp vs zindex nvcomp vs containerd nvcomp vs Moby nvcomp vs pixz nvcomp vs zip.js

Compare dietgpu vs nvcomp and see what are their differences.

dietgpu

nvcomp

dietgpu

nvcomp

What are some alternatives?