Time-Series Compression Algorithms

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

FastPFor

4 839 8.5 C++

The FastPFOR C++ library: Fast integer compression

One notable omission from this piece is a technique to compress integer time series with both positive and negative values.
If you naively apply bit-packing using the Simple8b algorithm, you'll find that negative integers are not compressed. This is due to how signed integers are represented in modern computers: negative integers will have their most significant bit set [1].
Zigzag encoding is a neat transform that circumvents this issue. It works by mapping signed integers to unsigned integers so that numbers with a small absolute value can be encoded using a small number of bits. Put another way, it encodes negative numbers using the least significant bit for sign. [2]
If you're looking for a quick way to experiment with various time series compression algorithm I highly recommend Daniel Lemire's FastPFor repository [3] (as linked in the article). I've used the Python bindings [4] to quickly evaluate various compression algorithms with great success.
Finally I'd like to humbly mention my own tiny contribution [5], an adaptation of Lemire's C++ Simple8b implementation (including basic methods for delta & zigzag encoding/decoding).
I used C++ templates to make the encoding and decoding routines generic over integer bit-width, which expands support up to 64 bit integers, and offers efficient usage with smaller integers (eg 16 bit). I made a couple other minor tweaks including support for arrays up to 2^64 in length, and tweaking the API/method signatures so they can be used in a more functional style. This implementation is slightly simpler to invoke via FFI, and I intend to add examples showing how to compile for usage via JS (WebAssembly), Python, and C#. I threw my code up quickly in order to share with you all, hopefully someone finds it useful. I intend to expand on usage examples/test cases/etc, and am looking forward to any comments or contributions.
[1] https://en.wikipedia.org/wiki/Signed_number_representation
[2] https://en.wikipedia.org/wiki/Variable-length_quantity#Zigza...
[3] https://github.com/lemire/FastPFor
[4] https://github.com/searchivarius/PyFastPFor
[5] https://github.com/naturalplasmoid/simple8b-timeseries-compr...

PyFastPFor

2 56 4.6 C++

Python bindings for the fast integer compression library FastPFor.

One notable omission from this piece is a technique to compress integer time series with both positive and negative values.
If you naively apply bit-packing using the Simple8b algorithm, you'll find that negative integers are not compressed. This is due to how signed integers are represented in modern computers: negative integers will have their most significant bit set [1].
Zigzag encoding is a neat transform that circumvents this issue. It works by mapping signed integers to unsigned integers so that numbers with a small absolute value can be encoded using a small number of bits. Put another way, it encodes negative numbers using the least significant bit for sign. [2]
If you're looking for a quick way to experiment with various time series compression algorithm I highly recommend Daniel Lemire's FastPFor repository [3] (as linked in the article). I've used the Python bindings [4] to quickly evaluate various compression algorithms with great success.
Finally I'd like to humbly mention my own tiny contribution [5], an adaptation of Lemire's C++ Simple8b implementation (including basic methods for delta & zigzag encoding/decoding).
I used C++ templates to make the encoding and decoding routines generic over integer bit-width, which expands support up to 64 bit integers, and offers efficient usage with smaller integers (eg 16 bit). I made a couple other minor tweaks including support for arrays up to 2^64 in length, and tweaking the API/method signatures so they can be used in a more functional style. This implementation is slightly simpler to invoke via FFI, and I intend to add examples showing how to compile for usage via JS (WebAssembly), Python, and C#. I threw my code up quickly in order to share with you all, hopefully someone finds it useful. I intend to expand on usage examples/test cases/etc, and am looking forward to any comments or contributions.
[1] https://en.wikipedia.org/wiki/Signed_number_representation
[2] https://en.wikipedia.org/wiki/Variable-length_quantity#Zigza...
[3] https://github.com/lemire/FastPFor
[4] https://github.com/searchivarius/PyFastPFor
[5] https://github.com/naturalplasmoid/simple8b-timeseries-compr...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
simple8b-timeseries-compression

1 7 10.0 C++

One notable omission from this piece is a technique to compress integer time series with both positive and negative values.
If you naively apply bit-packing using the Simple8b algorithm, you'll find that negative integers are not compressed. This is due to how signed integers are represented in modern computers: negative integers will have their most significant bit set [1].
Zigzag encoding is a neat transform that circumvents this issue. It works by mapping signed integers to unsigned integers so that numbers with a small absolute value can be encoded using a small number of bits. Put another way, it encodes negative numbers using the least significant bit for sign. [2]
If you're looking for a quick way to experiment with various time series compression algorithm I highly recommend Daniel Lemire's FastPFor repository [3] (as linked in the article). I've used the Python bindings [4] to quickly evaluate various compression algorithms with great success.
Finally I'd like to humbly mention my own tiny contribution [5], an adaptation of Lemire's C++ Simple8b implementation (including basic methods for delta & zigzag encoding/decoding).
I used C++ templates to make the encoding and decoding routines generic over integer bit-width, which expands support up to 64 bit integers, and offers efficient usage with smaller integers (eg 16 bit). I made a couple other minor tweaks including support for arrays up to 2^64 in length, and tweaking the API/method signatures so they can be used in a more functional style. This implementation is slightly simpler to invoke via FFI, and I intend to add examples showing how to compile for usage via JS (WebAssembly), Python, and C#. I threw my code up quickly in order to share with you all, hopefully someone finds it useful. I intend to expand on usage examples/test cases/etc, and am looking forward to any comments or contributions.
[1] https://en.wikipedia.org/wiki/Signed_number_representation
[2] https://en.wikipedia.org/wiki/Variable-length_quantity#Zigza...
[3] https://github.com/lemire/FastPFor
[4] https://github.com/searchivarius/PyFastPFor
[5] https://github.com/naturalplasmoid/simple8b-timeseries-compr...

simple8b-timeseries-compr

1 - -

One notable omission from this piece is a technique to compress integer time series with both positive and negative values.
If you naively apply bit-packing using the Simple8b algorithm, you'll find that negative integers are not compressed. This is due to how signed integers are represented in modern computers: negative integers will have their most significant bit set [1].
Zigzag encoding is a neat transform that circumvents this issue. It works by mapping signed integers to unsigned integers so that numbers with a small absolute value can be encoded using a small number of bits. Put another way, it encodes negative numbers using the least significant bit for sign. [2]
If you're looking for a quick way to experiment with various time series compression algorithm I highly recommend Daniel Lemire's FastPFor repository [3] (as linked in the article). I've used the Python bindings [4] to quickly evaluate various compression algorithms with great success.
Finally I'd like to humbly mention my own tiny contribution [5], an adaptation of Lemire's C++ Simple8b implementation (including basic methods for delta & zigzag encoding/decoding).
I used C++ templates to make the encoding and decoding routines generic over integer bit-width, which expands support up to 64 bit integers, and offers efficient usage with smaller integers (eg 16 bit). I made a couple other minor tweaks including support for arrays up to 2^64 in length, and tweaking the API/method signatures so they can be used in a more functional style. This implementation is slightly simpler to invoke via FFI, and I intend to add examples showing how to compile for usage via JS (WebAssembly), Python, and C#. I threw my code up quickly in order to share with you all, hopefully someone finds it useful. I intend to expand on usage examples/test cases/etc, and am looking forward to any comments or contributions.
[1] https://en.wikipedia.org/wiki/Signed_number_representation
[2] https://en.wikipedia.org/wiki/Variable-length_quantity#Zigza...
[3] https://github.com/lemire/FastPFor
[4] https://github.com/searchivarius/PyFastPFor
[5] https://github.com/naturalplasmoid/simple8b-timeseries-compr...

corpuscompression

2 7 0.0 Java

Achieve better compression for small objects with a predefined corpus

I've always liked this kind of thing. I've also done some experiments with automatically memorizing better starting dictionaries for a corpus eras where the data is small:
https://github.com/spullara/corpuscompression
Another interesting case is using a compressed stream to indicate anomalies in the data when the compression ratio spikes down like in log analysis.

banyan

1 36 0.0 Rust

I have found that a very good approach is to apply some very simple transformations such as delta encoding of timestamps, and then letting a good standard compression algorithm such as zstd or deflate take care of the rest.
Delta encoding of timestamps helps a lot though, because it makes the redundancy more visible to a general purpose compression algorithm.
I used this for the telemetry storage of the Columbus module of the international space station, back in ~2010, and then a few times since.
http://blog.klaehn.org/2018/06/10/efficient-telemetry-storag...
https://github.com/Actyx/banyan

interpolative_coding

1 27 0.0 C++

A flexible and efficient C++ implementation of the Binary Interpolative Coding algorithm.

I didn't see binary interpolative coding (BIC) referenced. It is one of my favorites introduced to me by the book "Managing Gigabytes" by Moffett and Bell [1]. It has great compression ratio for sequences and is commonly used in inverted indexes.
There is neat implementation [2] and technical paper [3] by Giulio Ermanno Pibiri, which I just found today by looking for it.
[1] https://people.eng.unimelb.edu.au/ammoffat/mg/
[2] https://github.com/jermp/interpolative_coding
[3] http://pages.di.unipi.it/pibiri/papers/BIC.pdf

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

The big-load anti-pattern

2 projects | news.ycombinator.com | 21 Aug 2021
FastPFOR: Fast Integer Compression in C++

2 projects | news.ycombinator.com | 20 Jul 2021
Making CRDTs 98% More Efficient

2 projects | news.ycombinator.com | 17 Oct 2023
Show HN: Sonauto – a more controllable AI music creator

1 project | news.ycombinator.com | 10 Apr 2024
TSAC: Low Bitrate Audio Compression

4 projects | news.ycombinator.com | 8 Apr 2024

Time-Series Compression Algorithms

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
simd-compression compression-schemes sorted-lists compression-algorithm
Post date: 14 May 2022

FastPFor

PyFastPFor

InfluxDB

simple8b-timeseries-compression

simple8b-timeseries-compr

corpuscompression

banyan

interpolative_coding

SaaSHub

Related posts

The big-load anti-pattern

FastPFOR: Fast Integer Compression in C++

Making CRDTs 98% More Efficient

Show HN: Sonauto – a more controllable AI music creator

TSAC: Low Bitrate Audio Compression

Time-Series Compression Algorithms

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com simd-compression compression-schemes sorted-lists compression-algorithm Post date: 14 May 2022

Related posts

The big-load anti-pattern

FastPFOR: Fast Integer Compression in C++

Making CRDTs 98% More Efficient

Show HN: Sonauto – a more controllable AI music creator

TSAC: Low Bitrate Audio Compression

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
simd-compression compression-schemes sorted-lists compression-algorithm
Post date: 14 May 2022