soci-snapshotter
zstd
soci-snapshotter | zstd | |
---|---|---|
6 | 109 | |
470 | 22,480 | |
4.0% | 1.7% | |
9.2 | 9.7 | |
2 days ago | 4 days ago | |
Go | C | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
soci-snapshotter
- Seekable OCI - Lazy Loading Container Images on ECS and Fargate
-
Kubernetes SidecarContainers feature is merged
So I can give some behind the scenes insight on that. I don't think image caching will be a thing in the way people are explicitly asking, but we are exploring some alternative approaches to speeding up container launch that we think will actually be even more effective than what people are asking for.
First of all we want to leverage some of the learnings from AWS Lambda, in specific some of the research we've done that shows that about 75% of container images only contain 5% unique bytes (https://brooker.co.za/blog/2023/05/23/snapshot-loading.html). This makes deduplication incredibly effective, and allows the deployment of a smart cache that holds the 95% of popular recurring files and file chunks from container images, while letting the unique 5% be loaded over the network. There will be outliers of course, but if you base your image off a well used base image then it will already be in the cache. This is partially implemented. You will notice that if you use certain base images your Fargate container seems to start a bit faster. (Unfortunately we do not really publish this list or commit to what base images are in the cache at this time).
In another step along this path we are working on SOCI Snapshotter (https://github.com/awslabs/soci-snapshotter) forked off of Stargz Snapshotter. This allows a container image to have an attached index file that actually allows it to start up before all the contents are downloaded, and lazy load in remaining chunks of the image as needed. This takes advantage of another aspect of container images which is that many of them don't actually use all of the bytes in the image anyway.
Over time we want to make these two pieces (deduplication and lazy loading) completely behind the scenes so you just upload your image to Elastic Container Registry and AWS Fargate seems to magically start your image dramatically faster than you could locally if downloading the image from scratch.
-
What is a containerd snapshotter?
This behavior allows us to set up snapshots out of band, that is outside the "normal" workflow. One such example is the soci snapshotter which allows image lazy loading. The snapshotter ships with a "rpull" command which performs this out of band prep. During rpull, the command calls the soci snapshotter which creates fuse mounts for each layer that has an index (remote snapshot). For layers that do not have an index, it downloads them as usual (local snapshot). The local snapshot is created with overlay mount. Anyway, this is just a detail, the important bit is that folders created for a local snapshot only contain that layer, which is exactly what overlay does. For example, after rpull we'll see something like the following:
-
A Hidden Gem: Two Ways to Improve AWS Fargate Container Launch Times
Seekable OCI (SOCI) is a technology open-sourced by AWS that enables containers to launch faster by lazily loading the container image. It’s usually not possible to fetch individual files from gzipped tar files. With SOCI, AWS borrowed some of the design principles from stargz-snapshotter, but took a different approach. A SOCI index is generated separately from the container image and is stored in the registry as an OCI Artifact and linked back to the container image by OCI Reference Types. This means that the container images do not need to be converted, image digests do not change, and image signatures remain valid.
- GitHub - awslabs/soci-snapshotter
- awslabs/soci-snapshotter
zstd
-
Rethinking string encoding: a 37.5% space efficient encoding than UTF-8 in Fury
> In such cases, the serialized binary are mostly in 200~1000 bytes. Not big enough for zstd to work
You're not referring to the same dictionary that I am. Look at --train in [1].
If you have a training corpus of representative data, you can generate a dictionary that you preshare on both sides which will perform much better for very small binaries (including 200-1k bytes).
If you want maximum flexibility (i.e. you don't know the universe of representative messages ahead of time or you want maximum compression performance), you can gather this corpus transparently as messages are generated & then generate a dictionary & attach it as sideband metadata to a message. You'll probably need to defer the decoding if it references a dictionary not yet received (i.e. send delivers messages out-of-order from generation). There are other techniques you can apply, but the general rule is that your custom encoding scheme is unlikely to outperform zstd + a representative training corpus. If it does, you'd need to actually show this rather than try to argue from first principles.
[1] https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md
-
Drink Me: (Ab)Using a LLM to Compress Text
> Doesn't take large amount of GPU resources
This is an understatement, zstd dictionary compression and decompression are blazingly fast: https://github.com/facebook/zstd/blob/dev/README.md#the-case...
My real-world use case for this was JSON files in a particular schema, and the results were fantastic.
-
SQLite VFS for ZSTD seekable format
This VFS will read a sqlite file after it has been compressed using [zstd seekable format](https://github.com/facebook/zstd/blob/dev/contrib/seekable_f...). Built to support read-only databases for full-text search. Benchmarks are provided in README.
-
Chrome Feature: ZSTD Content-Encoding
Of course, you may get different results with another dataset.
gzip (zlib -6) [ratio=32%] [compr=35Mo/s] [dec=407Mo/s]
zstd (zstd -2) [ratio=32%] [compr=356Mo/s] [dec=1067Mo/s]
NB1: The default for zstd is -3, but the table only had -2. The difference is probably small. The range is 1-22 for zstd and 1-9 for gzip.
NB2: The default program for gzip (at least with Debian) is the executable from zlib. With my workflows, libdeflate-gzip iscompatible and noticably faster.
NB3: This benchmark is 2 years old. The latest releases of zstd are much better, see https://github.com/facebook/zstd/releases
For a high compression, according to this benchmark xz can do slightly better, if you're willing to pay a 10× penalty on decompression.
xz -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]
zstd -9 [ratio=23%] [compr=2.6Mo/s] [dec=88Mo/s]
- Zstandard v1.5.6 – Chrome Edition
-
Optimizating Rabin-Karp Hashing
Compression, synchronization and backup systems often use rolling hash to implement "content-defined chunking", an effective form of deduplication.
In optimized implementations, Rabin-Karp is likely to be the bottleneck. See for instance https://github.com/facebook/zstd/pull/2483 which replaces a Rabin-Karp variant by a >2x faster Gear-Hashing.
- Show HN: macOS-cross-compiler – Compile binaries for macOS on Linux
-
Cyberpunk 2077 dev release
Get the data https://publicdistst.blob.core.windows.net/data/root.tar.zst magnet:?xt=urn:btih:84931cd80409ba6331f2fcfbe64ba64d4381aec5&dn=root.tar.zst How to extract https://github.com/facebook/zstd Linux (debian): `sudo apt install zstd` ``` tar -I 'zstd -d -T0' -xvf root.tar.zst ```
-
Honey, I shrunk the NPM package · Jamie Magee
I've done that experiment with zstd before.
https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md...
Not sure about brotli though.
-
How in the world should we unpack archive.org zst files on Windows?
If you want this functionality in zstd itself, check this out: https://github.com/facebook/zstd/pull/2349
What are some alternatives?
stargz-snapshotter - Fast container image distribution plugin with lazy pulling
LZ4 - Extremely Fast Compression algorithm
cloudsql-proxy - A utility for connecting securely to your Cloud SQL instances [Moved to: https://github.com/GoogleCloudPlatform/cloud-sql-proxy]
Snappy - A fast compressor/decompressor
enhancements - Enhancements tracking repo for Kubernetes
LZMA - (Unofficial) Git mirror of LZMA SDK releases
containers-roadmap - This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
7-Zip-zstd - 7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard
kubernetes - Production-Grade Container Scheduling and Management
ZLib - A massively spiffy yet delicately unobtrusive compression library.
brotli - Brotli compression format
haproxy - HAProxy Load Balancer's development branch (mirror of git.haproxy.org)