pcodec vs spark-pancake-connector

pcodec

Lossless compressor and decompressor for numerical data using quantiles (by mwlon)

Suggest topics

Source Code

Suggest alternative

Edit details

spark-pancake-connector

support for the "pancake" format in Spark (by pancake-db)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

pcodec		spark-pancake-connector
	Project
19	Mentions	2
248	Stars	5
-	Growth	-
8.8	Activity	0.0
2 days ago	Latest Commit	about 2 years ago
Rust	Language	Scala
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

pcodec

Posts with mentions or reviews of pcodec. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-22.

Learnings from making things fast
1 project | /r/rust | 19 Jul 2023

Context: I've been iterating on my side project pcodec (a codec for columns of numerical data) and have gradually improved decompression speed from ~150MB/s to ~1GB/s. Not everything here is novel or Rust-specific, but here's what I've learned in the process:
Compressing bytes?
1 project | /r/rust | 20 May 2023
Worries about tANS?
5 projects | /r/compression | 22 Apr 2023

For context: I'm creating an experimental successor to my library Quantile Compression, which does good compression for numerical sequences and has several users. I have a variable number of symbols which may be as high as 212 in some cases, but is ~26 in most cases. The data is typically 216 to 224 tokens long.
Quantile Compression, a compression format for numerical data that improves compression ratio by ~30% over alternatives
2 projects | /r/coolgithubprojects | 2 May 2022

I'm not a member, but you can use the CLI to try it out pretty easily: https://github.com/mwlon/quantile-compression/tree/main/q_compress_cli . Let me know how it does
I built Quantile Compression, which could make all our numerical columnar data 25% smaller.
3 projects | /r/dataengineering | 22 Feb 2022

You can try it out very easily with the CLI which works on CSV and Parquet columns now, e.g. cargo run --release compress --csv my.csv --col-name my_column out.qco
Quantile Compression: 35% higher compression ratio for numeric sequences than any other compressor
4 projects | /r/programming | 22 Feb 2022

Right, please don't try to use it for general files. It looks like zpaq is kinda hard to set up except on windows, so I'm probably not going to, but I encourage you to try it out! There's an example you can use to generate a bunch of random numerical distributions, outputting binary files, .qco, and other formats.
Q_compress: Lossless compressor and decompressor for numerical data
1 project | news.ycombinator.com | 18 Feb 2022
q_compress 0.7: still has 35% higher compression ratio than .zstd.parquet for numerical sequences, now with delta encoding and 2x faster than before
6 projects | /r/rust | 17 Feb 2022

Here's how you can generate benchmark data, including binary files: https://github.com/mwlon/quantile-compression/blob/main/q_compress/examples/primary.md
Quantile Compression, a format and algorithm for numerical sequences offering 35% higher compression ratio than .zstd.parquet.
4 projects | /r/compression | 17 Feb 2022

I made a simple CLI for compressing and inspecting .qco files. Not available on package managers yet, but it's still pretty easy to try out: https://github.com/mwlon/quantile-compression/blob/main/CLI.md
Quantile Compression (q-compress), a new compression format and rust library that shrinks real-world columns of numerical data 10-40% smaller than other methods
3 projects | /r/rust | 26 Nov 2021

spark-pancake-connector

Posts with mentions or reviews of spark-pancake-connector. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-02-22.

I built Quantile Compression, which could make all our numerical columnar data 25% smaller.
3 projects | /r/dataengineering | 22 Feb 2022

Yep. You can run the docker image and then either use the Spark connector or the Rust client to write to it. I've seen as high as 50k writes/second from one EC2 instance to another. Let me know how it goes!
I made PancakeDB, a new type of columnar DB that uses 30-50% less storage and read time than .snappy.parquet while offering efficient incremental writes
3 projects | /r/dataengineering | 13 Nov 2021

a Spark connector

What are some alternatives?

When comparing pcodec and spark-pancake-connector you can also consider the following projects:

ans-large-alphabet - Large-Alphabet Semi-Static Entropy Coding Via Asymmetric Numeral Systems

pancake-scala-client

encoding - Integer Compression Libraries for Go

pancake-core - essential libraries plus rust client

x3-rust - X3 Lossless Audio Compression for Rust

ryg_rans - Simple rANS encoder/decoder (arithmetic coding-ish entropy coder).

gdal - GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.

bitwise-compression - Trying some compression methods

TurboPFor - Fastest Integer Compression

FiniteStateEntropy - New generation entropy codecs : Finite State Entropy and Huff0

pcodec vs ans-large-alphabet spark-pancake-connector vs pancake-scala-client pcodec vs encoding spark-pancake-connector vs pancake-core pcodec vs x3-rust pcodec vs ryg_rans pcodec vs gdal pcodec vs bitwise-compression pcodec vs TurboPFor pcodec vs FiniteStateEntropy

Compare pcodec vs spark-pancake-connector and see what are their differences.

pcodec

spark-pancake-connector

pcodec

spark-pancake-connector

What are some alternatives?