DirectStorage
rapidgzip
DirectStorage | rapidgzip | |
---|---|---|
18 | 14 | |
653 | 317 | |
2.6% | - | |
4.5 | 9.5 | |
4 months ago | 12 days ago | |
C++ | C++ | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DirectStorage
-
Game Ready & Studio Driver 535.98 FAQ/Discussion
I don't think the GPU decompression optimizations is in this driver. I updated and using the Bulk Loader Demo test I'm actually seeing lower throughput than before. I used to hit around 24-25GB a sec, now I'm only hitting like 21GB a sec. Although it's possible but I doubt it, it could be related to the Windows 11 March update reducing SSD bandwidth. I haven't run the test in months so it might be related.
-
Pigz: Parallel gzip for modern multi-processor, multi-core machines
The data is compressed with GDeflate, not deflate. The single stream is designed to use the parallelism of a GPU. It is described here:
https://github.com/microsoft/DirectStorage/blob/main/GDeflat...
The GPU decompression benchmark I linked earlier allows you to specify a single file that it will compress with GDeflate (and zlib for comparison). The numbers presented in the docs that come with the benchmark and presented elsewhere are consistent with my own runs using a source file that is highly compressible.
Part of the trick of achieving this speedup is to read the data fast enough. I don't know of any NVMe drive that can reach full speed with a queue depth of 1. While running the benchmark in a windows VM with a GPU passed through, on the linux host I observed that the average read size was about 512k and the queue depth was sometimes over 30.
-
From Project Management to Data Compression Innovator: Building LZ4, ZStandard, and Finite State Entropy Encoder
We already have GDeflate, with permissive sources available for both CPU compression/decompression and GPU decompression in the DirectStorage GitHub repo. I haven't personally played with it yet, but I'll be implementing it in a project I'm working on in the next few months and am pretty excited to do so.
-
[Digital Foundry] The Last of Us Part 1 PC vs PS5 - A Disappointing Port With Big Problems To Address
Wrong: https://github.com/microsoft/DirectStorage/blob/main/Docs/diagrams.mmd
-
DirectStorage Performance Compared: AMD vs Intel vs Nvidia
The github repo has some command line parameters.
- DirectStorage in Star Citizen after Gen12
-
Samsung 990 Pro tested with DirectStorage. The Samsung 990 Pro, like the Sabrent Rocket 4 Plus-G and WD SN850X, has gaming / DirectStorage optimizations.
Only the final test where the results of several SSDs are displayed in the graph is a synthetic one. The first two both support DirectStorage and are designed with Microsoft's recommendations for DirectStorage in mind. That is, random reads of 32k or greater block sizes with high queue depths. This is because you need a high queue depth to be able to saturate NVMe drives.
-
Valve Halves Steam Deck SSD Bandwidth on Some Models
For most it'll be a background element they're not aware of, if you're running up to date win10 onwards you have DS capabilities, you can get the sample from microsoft build it and run it fine.
- DirectStorage API for Windows
-
Looks like PS5 exclusive Returnal is headed to PC
Direct Storage github Samples: https://github.com/microsoft/DirectStorage
rapidgzip
- Show HN: Rapidgzip – Parallel Gzip Decompressing with 10 GB/S
-
Ebiggers/libdeflate: Heavily optimized DEFLATE/zlib/gzip library
I also did benchmarks with zlib and libarchivemount via their library interface here [0]. It has been a while that I have run them, so I forgot. Unfortunately, I did not add libdeflate.
[0] https://github.com/mxmlnkn/rapidgzip/blob/master/src/benchma...
-
Rapidgzip – Parallel Decompression and Seeking in Gzip (Knespel, Brunst – 2023) [pdf]
Hi, author here.
You are right in the index being the easy-mode. Over the years there have been lots of implementations trying to add an index like that to the gzip metadata itself or as a sidecar file, with bgzip probably being the most known one. None of them really did stick, hence the necessity for some generic multi-threaded decompressor. A probably incomplete list of such implementations can be found in this issue: https://github.com/mxmlnkn/rapidgzip/issues/8
The index makes it so easy that I can simply delegate decompression to zlib. And since paper publication I've actually improved upon this by delegating to ISA-l / igzip instead, which is twice as fast. This is already in the 0.8.0 release.
As derived from table 1, the false positive rate is 1 Tbit / 202 = 5 Gbit or 625 MB for deflate blocks with dynamic Huffman code. For non-compressed blocks, the false positive rate is roughly one per 500 KB, however non-compressed blocks can basically be memcpied or skipped over and then the next deflate header can be checked without much latency. On the other hand, for dynamic blocks, the whole block needs to be decompressed first to find the next one. So the much higher false positive rate for non-compressed blocks doesn't introduce that much overhead.
I have some profiling built into rapidgzip, which is printed with -v, e.g., rapidgzip -v -d -o /dev/null 20xsilesia.tar.gz :
Time spent in block finder : 0.227751 s
- Intel QuickAssist Technology Zstandard Plugin for Zstandard
- Tool and Library for Parallel Gzip Decompression and Random Access
-
Pigz: Parallel gzip for modern multi-processor, multi-core machines
I have not only implemented parallel decompression but also random access to offsets in the stream with https://github.com/mxmlnkn/pragzip I did some benchmarks on some really beefy machines with 128 cores and was able to reach almost 20 GB/s decompression bandwidth. The single-core decoder has lots of potential for optimization because I had to write it from scratch, though.
-
Parquet: More than just “Turbo CSV”
Decompression of arbitrary gzip files can be parallelized with pragzip: https://github.com/mxmlnkn/pragzip
-
The Cost of Exception Handling
At the very least you are duplicating logic without the exception. The check for eof has to be done implicitly anyway inside read because it has to fill the bit buffer with data from the byte buffer or the byte buffer with data from the file. And if both fail, then we already know the result of eof, so no need to duplicate checking for eof in the outer read calling loop.
Here is the full commit with ad-hoc benchmark results in the commit message:
https://github.com/mxmlnkn/pragzip/commit/0b1af498377838c30f...
and here the benchmarks I ran at that time:
https://github.com/mxmlnkn/pragzip/blob/0b1af498377838c30fea...
As you can see, it's part of my random-seekable multi-threaded gzip and bzip2 parallel decompression libraries.
What you can also see in the commit message is that it wasn't a 50% time reduction but a 50% bandwidth increase, which would translate to a 30% time reduction. It seems I remembered that partly wrong. But it still was a significant optimization for me.
- How Much Faster Is Making a Tar Archive Without Gzip?
- Show HN: Thread-Parallel Decompression and Random Access to Gzip Files (Pragzip)
What are some alternatives?
Vortice.Windows - .NET bindings for Direct3D12, Direct3D11, WIC, Direct2D1, XInput, XAudio, X3DAudio, DXC, Direct3D9 and DirectInput.
pigz - A parallel implementation of gzip for modern multi-processor, multi-core machines.
DirectX12GameEngine - DirectX 12 .NET game engine
QATzip - Compression Library accelerated by Intel® QuickAssist Technology
X1nput - Xinput hook for Impulse Trigger emulation
parquet-format - Apache Parquet
display-drivers-uninstaller - Display Driver Uninstaller (DDU) a driver removal utility / cleaner utility
nvcomp - Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
pixz - Parallel, indexed xz compressor
solaris-userland - Open Source software in Solaris using gmake based build system to drive building various software components.
QAT-ZSTD-Plugin