SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 C++ Gpgpu Projects
-
FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Project mention: Ask HN: How to learn CUDA to professional level | news.ycombinator.com | 2025-06-08 -
AdaptiveCpp
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Project mention: AdaptiveCpp's new Metal backend to support CUDA dialect on Apple GPUs | news.ycombinator.com | 2026-02-27 -
-
-
-
-
cuda-api-wrappers
Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.
> CUDA Runtime: The runtime library (libcudart) that applications link against.
That library is actually a rather poor idea. If you're writing a CUDA application, I strongly recommend avoiding the "runtime API". It provides partial access to the actual CUDA driver and its API, which is 'simpler' in the sense that you don't explicitly create "contexts", but:
* It hides or limits a lot of the functionality.
* Its actual behavior vis-a-vis contexts is not at all simple and is likely to make your life more difficult down the road.
* It's not some clean interface that's much more convenient to use.
So, either go with the driver, or consider my CUDA API wrappers library [1], which _does_ offer a clean, unified, modern (well, C++11'ish) RAII/CADRe interface. And it covers much more than the runtime API, to boot: JIT compilation of CUDA (nvrtc) and PTX (nvptx_compiler), profiling (nvtx), etc.
[1] : https://github.com/eyalroz/cuda-api-wrappers/
-
-
-
-
OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
-
-
-
-
-
-
-
GPUPrefixSums
A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.
Lol do you think "PTX programming" is some kind of secret source of perf? It's just inline asm. Sometimes it's necessary but most of the time "CUDA is all you need":
https://github.com/b0nes164/GPUPrefixSums
-
ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
I was asked this a few months back but don’t have the measurements fresh anymore. In general, I think TBB is one of the more thorough and feature-rich parallelism libraries out there. That said, I just found a comparable usage example in my benchmarks, and it doesn’t look like TBB will have the same low-latency profile as Fork Union: https://github.com/ashvardanian/ParallelReductionsBenchmark/...
-
C++ Gpgpu discussion
C++ Gpgpu related posts
-
AdaptiveCpp's new Metal backend to support CUDA dialect on Apple GPUs
-
Prefix sum: 20 GB/s (2.6x baseline)
-
Ask HN: How to learn CUDA to professional level
-
AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs
-
AdaptiveCpp: Implementation of SYCL and C++ CPUs and GPUs
-
AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number
-
Gimps Discovers Largest Known Prime Number: 2^136279841 – 1
-
A note from our sponsor - SaaSHub
www.saashub.com | 6 Jun 2026
Index
What are some of the best open-source Gpgpu projects in C++? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | FluidX3D | 5,115 |
| 2 | ArrayFire | 4,887 |
| 3 | SHADERed | 4,727 |
| 4 | kompute | 2,517 |
| 5 | AdaptiveCpp | 1,869 |
| 6 | Boost.Compute | 1,653 |
| 7 | MatX | 1,429 |
| 8 | compute-runtime | 1,395 |
| 9 | stdgpu | 1,263 |
| 10 | cuda-api-wrappers | 890 |
| 11 | amgcl | 864 |
| 12 | vulkan_minimal_compute | 731 |
| 13 | VexCL | 721 |
| 14 | OpenCL-Wrapper | 475 |
| 15 | occa | 441 |
| 16 | BabelStream | 363 |
| 17 | opencl-intercept-layer | 360 |
| 18 | vuh | 351 |
| 19 | RayTracing | 343 |
| 20 | OpenCL-Benchmark | 298 |
| 21 | GPUPrefixSums | 290 |
| 22 | ParallelReductionsBenchmark | 118 |
| 23 | UE4_GPGPU_flocking | 80 |