Prefix sum on portable compute shaders

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

compute-shader-101

8 489 0.0 Rust

Sample code for compute shader 101 training

Workgroup in Vulkan/WebGPU lingo is equivalent to "thread block" in CUDA speak; see [1] for a decoder ring.
> Using atomics to solve this is rarely a good idea, atomics will make things go slowly, and there is often a way to restructure the problem so that you can let threads read data from a previous dispatch, and break your pipeline into more dispatches if necessary.
This depends on the exact workload, but I disagree. A multiple dispatch solution to prefix sum requires reading the input at least twice, while decoupled look-back is single pass. That's a 1.5x difference if you're memory saturated, which is a good assumption here.
The Nanite talk (which I linked) showed a very similar result, for very similar reasons. They have a multi-dispatch approach to their adaptive LOD resolver, and it's about 25% slower than the one that uses atomics to manage the job queue.
Thus, I think we can solidly conclud that atomics are an essential part of the toolkit for GPU compute.
You do make an important distinction between runtime and development environment, and I should fix that, but there's still a point to be made. Most people doing machine learning work need a dev environment (or use Colab), even if they're theoretically just consuming GPU code that other people wrote. And if you do distribute a CUDA binary, it only runs on Nvidia. By contrast, my stuff is a 20-second "cargo build" and you can write your own GPU code with very minimal additional setup.
[1]: https://github.com/googlefonts/compute-shader-101/blob/main/...

vello

31 1,945 9.4 Rust

An experimental GPU compute-centric 2D renderer.

Yeah, sometimes atomics perform way better than you expect them to. Check out the linkedlist benchmark in my suite, 12.1 G elements/s on AMD 5700 XT using DX12. That's a respectable fraction of raw memory bandwidth. Carrying over intuition from CPU land, you'd expect it to be very slow.
Looking at the ISA[2] you can get a glimpse of the magic that happens under the hood to make that happen. (Note: this test case is slightly simplified from what's in the repo for pedagogical reasons).
[1]: https://github.com/linebender/piet-gpu/blob/master/tests/sha...
[2]: https://shader-playground.timjones.io/da907f46d8bace9e5db7bd...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

wgpu-rs resources for computing purposes only

2 projects | /r/rust | 10 Mar 2023
Vulkan terms vs. Direct3D 12 (aka DirectX 12) terms

2 projects | /r/vulkan | 30 May 2022
WGPU setup and compute shader feedback - and Tutorial.

2 projects | /r/rust | 16 Jan 2022
Compute Shaders and Rust - looking for some guidance.

3 projects | /r/rust | 15 Jan 2022
Compute shaders - where to learn more outside of unity

2 projects | /r/gamedev | 31 Oct 2021

Prefix sum on portable compute shaders

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
gpu-computing Shaders
Post date: 17 Nov 2021

compute-shader-101

vello

InfluxDB

Related posts

wgpu-rs resources for computing purposes only

Vulkan terms vs. Direct3D 12 (aka DirectX 12) terms

WGPU setup and compute shader feedback - and Tutorial.

Compute Shaders and Rust - looking for some guidance.

Compute shaders - where to learn more outside of unity

Prefix sum on portable compute shaders

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com gpu-computing Shaders Post date: 17 Nov 2021

compute-shader-101

vello

InfluxDB

Related posts

wgpu-rs resources for computing purposes only

Vulkan terms vs. Direct3D 12 (aka DirectX 12) terms

WGPU setup and compute shader feedback - and Tutorial.

Compute Shaders and Rust - looking for some guidance.

Compute shaders - where to learn more outside of unity

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
gpu-computing Shaders
Post date: 17 Nov 2021