InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 C++ Gpgpu Projects
-
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
-
-
kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Project mention: Ask HN: How to learn CUDA to professional level | news.ycombinator.com | 2025-06-08 -
AdaptiveCpp
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Project mention: AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs | news.ycombinator.com | 2025-01-02 -
-
-
Sevalla
Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
-
-
-
cuda-api-wrappers
Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.
Project mention: Nvidia Security Team: "What if we just stopped using C?" (2022) | news.ycombinator.com | 2025-02-13> with the C++ API
The funny thing is that the "C++ API" is almost entirely C-like, foregoing almost everything beneficial and convenient about C++, while at the same time not being properly limited to C.
(which is why I wrote this: https://github.com/eyalroz/cuda-api-wrappers/ )
> an awful GPU mailbox design is still the cutting-edge tech
Can you elaborate on what you mean by a "mailbox design"?
-
For my tasks, I had some success with algebraic multigrid solvers as preconditioner, for example from AMGCL or PyAMG. They are also reasonably easy to get started with.
https://github.com/pyamg/pyamg
https://github.com/ddemidov/amgcl
But I only have to deal with positive definite systems, so YMMV.
I am not sure whether those libraries can deal with multiple right-hand sides, but most complexity is in the preconditioners anyway.
-
-
-
-
OpenCL-Wrapper
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
-
-
-
-
-
-
Project mention: AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number | news.ycombinator.com | 2024-10-25
Hi, I'm Mihai Preda the author of GpuOwl/PRPLL [1], an OpenCL software that was used by Luke Durant for his recent discovery of the largest prime number know, the 52nd Mersenne prime 2^136279841 - 1 [2].
Feel free to ask questions about technical aspects of the GpuOwl implementation, about optimizations, tricks, efficient FFT implementation on GPUs etc. Or anything else.
[1] GpuOwl: https://github.com/preda/gpuowl
-
GPUPrefixSums
A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.
Project mention: GPUPrefixSums – state of the art GPU prefix sum algorithms | news.ycombinator.com | 2025-08-28 -
ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
Project mention: Memory-Level Parallelism: Apple M2 vs. Apple M4 | news.ycombinator.com | 2025-07-09It’s a very interesting benchmark (https://github.com/lemire/TestingMLP) — probably worth adding to the Phoronix suite.
Every couple of years I refresh my own parallel reduction benchmarks (https://github.com/ashvardanian/ParallelReductionsBenchmark), which are also memory-bound. Mine mostly focus on the boring but necessary throughput-maximizing cases on CPUs and GPUs.
Lately, as GPUs are pulled into more general data-processing tasks, I keep running into non-coalesced, pointer-chasing patterns — but I still don’t have a good mental model for estimating the cost of different access strategies. A crossover between these two topics — running MLP-style loads on GPUs — might be exactly the benchmark missing, in case someone is looking for a good weekend project!
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
C++ Gpgpu discussion
C++ Gpgpu related posts
-
Ask HN: How to learn CUDA to professional level
-
AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs
-
AdaptiveCpp: Implementation of SYCL and C++ CPUs and GPUs
-
AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number
-
Gimps Discovers Largest Known Prime Number: 2^136279841 – 1
-
New Mersenne Prime discovered (probably)
-
AdaptiveCpp – SYCL implementation to run C++ on CPUs and GPUs
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Sep 2025
Index
What are some of the best open-source Gpgpu projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | ArrayFire | 4,768 |
2 | FluidX3D | 4,631 |
3 | SHADERed | 4,537 |
4 | kompute | 2,319 |
5 | AdaptiveCpp | 1,685 |
6 | Boost.Compute | 1,624 |
7 | MatX | 1,349 |
8 | compute-runtime | 1,270 |
9 | stdgpu | 1,234 |
10 | cuda-api-wrappers | 856 |
11 | amgcl | 805 |
12 | vulkan_minimal_compute | 729 |
13 | VexCL | 715 |
14 | occa | 431 |
15 | OpenCL-Wrapper | 422 |
16 | vuh | 350 |
17 | BabelStream | 347 |
18 | opencl-intercept-layer | 340 |
19 | RayTracing | 336 |
20 | OpenCL-Benchmark | 242 |
21 | gpuowl | 203 |
22 | GPUPrefixSums | 161 |
23 | ParallelReductionsBenchmark | 105 |