casync
zstd
Our great sponsors
casync | zstd | |
---|---|---|
17 | 105 | |
1,461 | 22,293 | |
0.7% | 1.9% | |
2.4 | 9.6 | |
4 months ago | 8 days ago | |
C | C | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
casync
-
Tool to clone file structure without the large files themselves?
You probably want casync.
-
LibSQL – a fork of SQLite that is both Open Source, and Open Contributions
(personally, I think more people need to be aware of casync for the update storage/distribution problem. It isn't perfect for every use case, but it's good enough that you're probably better off wrapping/forking it rather than reimplementing it badly from scratch)
-
improving download infra
Does something like casync (https://github.com/systemd/casync or https://github.com/folbricht/desync) serve any purpose or provide any advantage to propagating rpm changes over rsync?
-
Are there any true alternatives to Seafile? (Nextcloud is not an alternative in this context)
Software that comes to mind for syncing lots of small files: git (and other source versioning tools), casync (https://github.com/systemd/casync) and a go implementation (https://github.com/folbricht/desync). Not really an answer and I can't think of a way to shoehorn that into your workflow, but maybe it leads you down a useful road.
-
Hacker News top posts: Apr 23, 2022
Casync – A Content-Addressable Data Synchronization Tool\ (15 comments)
-
Casync – A Content-Addressable Data Synchronization Tool
I was wondering how this gets any common chunks at all with the removed file boundaries. Turns out that chunks don't have a set size, just min/max/avg values, so unaligned streams may end up synchronizing. https://github.com/systemd/casync/blob/master/src/cachunker.... If I understood that correctly, that's pretty cool.
But looking at the code I'm having strong "nope" feelings. First, because of lines like "q += m, n -= m;". Second, because of int/enum/semantic abuse: `compression_type` may be _CA_COMPRESSION_TYPE_INVALID which I hope is 0, `>= 0` as a known compression type, or `-EAGAIN` as an error. (from https://github.com/systemd/casync/blob/99559cd1d8cea69b30022... ) I'd bet that just throwing afl at the decompressor will find issues :(
I do like the idea though.
-
Blobcache is a content addressed data store, designed to be a replicated data layer for applications.
Compare https://github.com/systemd/casync which handles splitting/diffing, but does not handle fancy replication.
- Deduplicating Archiver with Compression and Encryption
zstd
-
Chrome Feature: ZSTD Content-Encoding
Citation needed? https://github.com/facebook/zstd/commits/dev/?author=jiaT75
Of course, you may get different results with another dataset.
gzip (zlib -6) [ratio=32%] [compr=35Mo/s] [dec=407Mo/s]
zstd (zstd -2) [ratio=32%] [compr=356Mo/s] [dec=1067Mo/s]
NB1: The default for zstd is -3, but the table only had -2. The difference is probably small. The range is 1-22 for zstd and 1-9 for gzip.
NB2: The default program for gzip (at least with Debian) is the executable from zlib. With my workflows, libdeflate-gzip iscompatible and noticably faster.
NB3: This benchmark is 2 years old. The latest releases of zstd are much better, see https://github.com/facebook/zstd/releases
For a high compression, according to this benchmark xz can do slightly better, if you're willing to pay a 10× penalty on decompression.
xz -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]
zstd -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]
There is an issue tracking this with a bunch of links to discussions about it, but they continue to not have time it seems.
https://github.com/facebook/zstd/issues/3100
This was the first place my mind went when I saw this Content-Encoding announcement, so I ran and re-checked the issue :(.
Yes, but they also work for megacorp Facebook, and according to https://github.com/facebook/zstd/graphs/contributors, 300+ other contributors have made 4500+ commits to the zstd repo.
It's not quite as small-scale as 90's-style shareware was.
- Show HN: macOS-cross-compiler – Compile binaries for macOS on Linux
-
How in the world should we unpack archive.org zst files on Windows?
If you want this functionality in zstd itself, check this out: https://github.com/facebook/zstd/pull/2349
- ZSTD 1.5.5 is released with a corruption fix found at Google
-
Float Compression 3: Filters
Interesting to match with the observations from the practice of using ClickHouse[1][2] for time series:
1. Reordering to SOA helps a lot - this is the whole point of column-oriented databases.
2. Specialized codecs like Gorilla[3], DoubleDelta[4], and FPC[5] lose to simply using ZSTD[6] compression in most cases, both in compression ratio and in performance.
3. Specialized time-series DBMS like InfluxDB or TimescaleDB lose to general-purpose relational OLAP DBMS like ClickHouse [7][8][9].
[1] https://clickhouse.com/blog/optimize-clickhouse-codecs-compr...
[2] https://github.com/ClickHouse/ClickHouse
[3] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[4] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[5] https://clickhouse.com/docs/en/sql-reference/statements/crea...
[6] https://github.com/facebook/zstd/
[7] https://arxiv.org/pdf/2204.09795.pdf "SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things" (2022)
[8] https://gitlab.com/gitlab-org/incubation-engineering/apm/apm... https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...
[9] https://www.sciencedirect.com/science/article/pii/S187705091...
-
We're wasting money by only supporting gzip for raw DNA files
zstd has a long range mode, which lets it find redundancies a gigabyte away. Try --long and --long=31 for very long range mode.
zstd has delta / patch mode, which creates a file that stores the "patch" to create a new file from an old (reference) file. See https://github.com/facebook/zstd/wiki/Zstandard-as-a-patchin...
See the man page: https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md
What are some alternatives?
LZ4 - Extremely Fast Compression algorithm
Snappy - A fast compressor/decompressor
LZMA - (Unofficial) Git mirror of LZMA SDK releases
7-Zip-zstd - 7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard
ZLib - A massively spiffy yet delicately unobtrusive compression library.
brotli - Brotli compression format
haproxy - HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
LZFSE - LZFSE compression library and command line tool
zlib-ng - zlib replacement with optimizations for "next generation" systems.
zlib - Cloudflare fork of zlib with massive performance improvements
zfs - OpenZFS on Linux and FreeBSD
LZHAM - Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++