wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format. (by Wittline)
libsais
libsais is a library for linear time suffix array, longest common prefix array and burrows wheeler transform construction based on induced sorting algorithm. (by IlyaGrebnov)
wbz | libsais | |
---|---|---|
1 | 1 | |
13 | 164 | |
- | - | |
0.0 | 5.8 | |
almost 2 years ago | 25 days ago | |
Python | C | |
Apache License 2.0 | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wbz
Posts with mentions or reviews of wbz.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-06-15.
-
Data Engineering Projects for Beginners
Building a Lossless Data Compression and Data Decompression Pipeline
libsais
Posts with mentions or reviews of libsais.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-03-09.
-
The Technical Workloads Where AMD Ryzen 9 7900X3D/7950X3D CPUs Are Excellent
The old engineered state of art was difsufsort but now there is libsais that makes use of prefetching (would be interesting to see how both react to huge caches). As for datasets, there are many classical ones. From rough order of size: Silesia Corpus, Manzini Corpus, Pizza&Chili Corpus, Large Text Compression Benchmark Corpus, etc.
What are some alternatives?
When comparing wbz and libsais you can also consider the following projects:
docker-livy - Dockerizing and Consuming an Apache Livy environment
libdivsufsort - A lightweight suffix-sorting library
pyDag - Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
SeqAn - SeqAn's official repository.
sdsl-lite - Succinct Data Structure Library 3.0
lzsa - Byte-aligned, efficient lossless packer that is optimized for fast decompression on 8-bit micros
cello - A string library