Our great sponsors
-
Rust-CUDA
Discontinued Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust. [Moved to: https://github.com/Rust-GPU/Rust-CUDA] (by RDambrosio016)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
The best way to do it is probably the way rust-gpu does it: https://github.com/EmbarkStudios/rust-gpu/blob/main/docs/src...
The entry point of the kernel would supply any objects that have special properties.
https://github.com/RDambrosio016/Rust-CUDA/blob/master/guide...
* Missing Atomics -- Gamebreaker IMO. Atomics are absolutely essential when you are dealing with 10,000+ threads on a regular basis. You'll inevitably come across a shared data-structure that requires write-access from each thread, and some coordination mechanism is needed for that. Atomics are one important fit.
Ironic, a few days ago, I argued for the use of Fork-join parallelism in most cases (aka: Kernel launch / synchronized kernel exits). Now I find myself arguing the opposite now that we have a topic here with missing atomics. Like... atomics need to be used very, very rarely, but those rare uses are incredibly important.
* Warp Vote / Match / Reduce / Shuffle missing (Very useful tools for highly-optimized code, but you can write slower code that does the same thing through \_\_shared\_\_ memory just fine)
------
Wait, does this support \_\_shared\_\_ memory at all? Raw access to memory is not really amenable to Rust's programming style, but its absolutely necessary for high-performance GPU programming.
> "Extremely fast"
When people make claims like this, it would be good if they put the benchmarks on the first page. E.g, how does it compare with https://github.com/gfx-rs/wgpu which lets you target Vulkan, Metal, DX, GL or WASM+WebGPU with rust?
Would be really nice to have an actual cross platform GPGPU library. It's really holding every kind of progress back to have only vendor lock-in.
Maybe WebCPU will be capable of compute to the extend that CUDA isn't necessary. https://github.com/UpsettingBoy/gpgpu-rs