-
dietgpu
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ANS is super fast and trivially parallizable, faster than Huffman or especially arithmetic encoding. It is fast because it can be machine word oriented (you can read/write whole machine word sizes at a time, not arbitrary/variable bit length sequences), and as a result you can interleave any number of independent (parallel) encoders in the same stream with just a prefix sum to figure out where to write the state normalization values. I for one got up to 400 GB/s throughput on A100 GPUs in my implementation (https://github.com/facebookresearch/dietgpu).
ANS can also self-synchronize as well.
https://github.com/weissenberger/gpuhd
The authors of this repo/paper use the self-synchronizing property of almost all Huffman codes to implement parallel Huffman decoding on the GPU at ~10 GB/s. In practice, I haven't found this to be useful to do Huffman decoding on the CPU, since the GPU round-trip outweighs the speed of the GPU. But if your data is already on the GPU, this is a really cool way to to Huffman decoding.