The Simple Beauty of XOR Floating Point Compression

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • dietgpu

    GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

  • https://computing.llnl.gov/projects/floating-point-compressi...

    but it tends to be very application specific, where there tends to be high correlation / small deltas between neighboring values in a 2d/3d/4d/etc floating point array (e.g., you are compressing neighboring temperature grid points in a PDE weather simulation model; temperature differences in neighboring cells won't differ by that much).

    In a lot of other cases (e.g., machine learning) the floating point significand bits (and sometimes the sign bit) tends to be incompressible noise. The exponent is the only thing that is really compressible, and the xor trick does not help you as much because neighboring values could still vary a bit in terms of exponents. An entropy encoder instead works well for that (encode closer to the actual underlying data distribution/entropy), and you also don't depend upon neighboring floats having similar exponents as well.

    In 2022, I created dietgpu, a library to losslessly compress/decompress floating point data at up to 400 GB/s on an A100. It uses a general-purpose asymmetric numeral system encoder/decoder on GPU (the first such implementation of general ANS on GPU, predating nvCOMP) for exponent compression.

    We have used this to losslessly compress floating point data between GPUs (e.g., over Infiniband/NVLink/ethernet/etc) in training massive ML models to speed up overall wall clock time of training across 100s/1000s of GPUs without changing anything about how the training works (it's lossless compression, it computes the same thing that it did before).

    https://github.com/facebookresearch/dietgpu

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • CUDA Checkpoint and Restore

    1 project | news.ycombinator.com | 30 Apr 2024
  • Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia

    1 project | news.ycombinator.com | 24 Apr 2024
  • Show HN: One Billion Rows in CUDA

    1 project | news.ycombinator.com | 13 Apr 2024
  • Show HN: Faster sorting with register shuffling in CUDA

    1 project | news.ycombinator.com | 15 Mar 2024
  • Raft: Fundamental widely-used algorithms and primitives for machine learning

    1 project | news.ycombinator.com | 22 Feb 2024