Generic GPU Kernels

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

CUDA.jl

15 1,131 9.5 Julia

CUDA programming in Julia.

Should have (2017) in the title.
Indeed cool to program julia directly on the GPU and Julia on GPU and this has further evolved since then, see https://juliagpu.org/

Halide

43 5,703 9.5 C++

a language for fast, portable data-parallel computation

Unfortunately, I don't see a "just a bit of magic here without learning much of anything new" interface coming because it's all about strategizing the movement of data. This is not unique to GPUs. It's a universal problem across computing hardware. It's just enabled to be explicit in OpenCL/CUDA. As compared to most languages where you try to steer things the right way and the CPU does it's best with whatever mess it gets.
Closest I know of is https://halide-lang.org/ And, that is specialized around images.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
futhark

52 2,291 9.8 Haskell

:boom::computer::boom: A data-parallel functional programming language

I cannot overstate the importance of using a programming language targeting GPUs directly like Futhark (https://github.com/diku-dk/futhark). In this case, it is a functional, declarative language where you can focus on the why, not the how. Just like CPUs are incredibly complex, higher level abstractions are very important.
If you were a pro GPU programmer and had 10 years, Futhark would be maybe 10x slower. But just like we do not program in assembly when making critically fast software, most non-simple things are easier written in this.

KernelAbstractions.jl

4 331 8.0 Julia

Heterogeneous programming in Julia

>Higher level abstractions
like these?
https://github.com/JuliaGPU/KernelAbstractions.jl

Tullio.jl

4 581 5.2 Julia

⅀
FoldsCUDA.jl

1 54 0.0 Julia

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project