-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
kernel_tuner_tutorial
A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/
-
wonnx
A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
This is a great post for people who are new to optimizing GPU code.
It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).
As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)
This is a great post for people who are new to optimizing GPU code.
It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).
As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)
Kernel Tuner is great! Remember going to a tutorial at SC21. Would highly recommend the tutorials they used to get familiar as well (https://github.com/KernelTuner/kernel_tuner_tutorial)
At the end of the post, he links to excalidraw[0]
[0] https://excalidraw.com/
I am curious about doing the same kind of thing for compute shaders. I'm aware of Kompute.cc (which is Vulkan based) but haven't looked at their GEMM kernels, and also of wonnx for WebGPU ([1] is their GEMM code).
I'm also curious whether warp shuffle operations might be useful to reduce some of the shared memory traffic.
[1]: https://github.com/webonnx/wonnx/blob/master/wonnx/templates...