How to Optimize a CUDA Matmul Kernel for CuBLAS-Like Performance: A Worklog

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • cutlass

    CUDA Templates for Linear Algebra Subroutines

  • This is a great post for people who are new to optimizing GPU code.

    It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).

    As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)

  • kernel_tuner

    Kernel Tuner

  • This is a great post for people who are new to optimizing GPU code.

    It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).

    As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • kernel_tuner_tutorial

    A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/

  • Kernel Tuner is great! Remember going to a tutorial at SC21. Would highly recommend the tutorials they used to get familiar as well (https://github.com/KernelTuner/kernel_tuner_tutorial)

  • excalidraw

    Virtual whiteboard for sketching hand-drawn like diagrams

  • At the end of the post, he links to excalidraw[0]

    [0] https://excalidraw.com/

  • wonnx

    A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web

  • I am curious about doing the same kind of thing for compute shaders. I'm aware of Kompute.cc (which is Vulkan based) but haven't looked at their GEMM kernels, and also of wonnx for WebGPU ([1] is their GEMM code).

    I'm also curious whether warp shuffle operations might be useful to reduce some of the shared memory traffic.

    [1]: https://github.com/webonnx/wonnx/blob/master/wonnx/templates...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • An infinite canvas for code exploration

    3 projects | news.ycombinator.com | 6 May 2024
  • Creating Animated Diagrams for LinkedIn

    3 projects | dev.to | 22 Apr 2024
  • DCompute: Native execution of D on GPUs and other Accelerators

    1 project | news.ycombinator.com | 24 Mar 2024
  • Show HN: Batch Image Manipulation Toolkit in Browser

    2 projects | news.ycombinator.com | 4 Feb 2024
  • Ask HN: What development tools are you using for your current project?

    2 projects | news.ycombinator.com | 3 Feb 2024