Top 23 C++ Gpgpu Projects

ArrayFire

1 6 4,768 8.5 C++

ArrayFire: a general purpose GPU library.
JetBrains

surveys.jetbrains.com featured

Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
FluidX3D

2 54 4,631 8.3 C++

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

Project mention: FluidX3D | news.ycombinator.com | 2024-12-07
SHADERed

3 24 4,537 0.0 C++

Lightweight, cross-platform & full-featured shader IDE
kompute

4 40 2,319 6.2 C++

General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

Project mention: Ask HN: How to learn CUDA to professional level | news.ycombinator.com | 2025-06-08
AdaptiveCpp

5 23 1,685 9.6 C++

Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!

Project mention: AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs | news.ycombinator.com | 2025-01-02
Boost.Compute

6 0 1,624 1.2 C++

A C++ GPU Computing Library for OpenCL
MatX

7 7 1,349 9.4 C++

An efficient C++17 GPU numerical computing library with Python-like syntax
Sevalla

sevalla.com featured

Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!
compute-runtime

8 58 1,270 10.0 C++

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
stdgpu

9 0 1,234 6.0 C++

stdgpu: Efficient STL-like Data Structures on the GPU
cuda-api-wrappers

10 15 856 7.3 C++

Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.

Project mention: Nvidia Security Team: "What if we just stopped using C?" (2022) | news.ycombinator.com | 2025-02-13

> with the C++ API
The funny thing is that the "C++ API" is almost entirely C-like, foregoing almost everything beneficial and convenient about C++, while at the same time not being properly limited to C.
(which is why I wrote this: https://github.com/eyalroz/cuda-api-wrappers/ )
> an awful GPU mailbox design is still the cutting-edge tech
Can you elaborate on what you mean by a "mailbox design"?
amgcl

11 2 805 3.8 C++

C++ library for solving large sparse linear systems with algebraic multigrid method

Project mention: CuPy: NumPy and SciPy for GPU | news.ycombinator.com | 2024-09-20

For my tasks, I had some success with algebraic multigrid solvers as preconditioner, for example from AMGCL or PyAMG. They are also reasonably easy to get started with.
https://github.com/pyamg/pyamg
https://github.com/ddemidov/amgcl
But I only have to deal with positive definite systems, so YMMV.
I am not sure whether those libraries can deal with multiple right-hand sides, but most complexity is in the preconditioners anyway.
vulkan_minimal_compute

12 1 729 0.0 C++

Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.
VexCL

13 0 715 5.2 C++

VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
occa

14 1 431 2.9 C++

Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
OpenCL-Wrapper

15 7 422 7.4 C++

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.
vuh

16 3 350 2.8 C++

Vulkan compute for people
BabelStream

17 1 347 4.4 C++

STREAM, for lots of devices written in many programming models
opencl-intercept-layer

18 9 340 7.8 C++

Intercept Layer for Debugging and Analyzing OpenCL Applications
RayTracing

19 1 336 4.6 C++

Realtime GPU Path tracer based on OpenCL and OpenGL (by AlexanderVeselov)
OpenCL-Benchmark

20 1 242 7.0 C++

A small OpenCL benchmark program to measure peak GPU/CPU performance.
gpuowl

21 4 203 8.9 C++

GPU Mersenne primality test.

Project mention: AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number | news.ycombinator.com | 2024-10-25

Hi, I'm Mihai Preda the author of GpuOwl/PRPLL [1], an OpenCL software that was used by Luke Durant for his recent discovery of the largest prime number know, the 52nd Mersenne prime 2^136279841 - 1 [2].
Feel free to ask questions about technical aspects of the GpuOwl implementation, about optimizations, tricks, efficient FFT implementation on GPUs etc. Or anything else.
[1] GpuOwl: https://github.com/preda/gpuowl
GPUPrefixSums

22 1 161 5.0 C++

A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.

Project mention: GPUPrefixSums – state of the art GPU prefix sum algorithms | news.ycombinator.com | 2025-08-28
ParallelReductionsBenchmark

23 3 105 9.0 C++

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!

Project mention: Memory-Level Parallelism: Apple M2 vs. Apple M4 | news.ycombinator.com | 2025-07-09

It’s a very interesting benchmark (https://github.com/lemire/TestingMLP) — probably worth adding to the Phoronix suite.
Every couple of years I refresh my own parallel reduction benchmarks (https://github.com/ashvardanian/ParallelReductionsBenchmark), which are also memory-bound. Mine mostly focus on the boring but necessary throughput-maximizing cases on CPUs and GPUs.
Lately, as GPUs are pulled into more general data-processing tasks, I keep running into non-coalesced, pointer-chasing patterns — but I still don’t have a good mental model for estimating the cost of different access strategies. A crossover between these two topics — running MLP-style loads on GPUs — might be exactly the benchmark missing, in case someone is looking for a good weekend project!
InfluxDB

www.influxdata.com featured

InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Gpgpu discussion

C++ Gpgpu related posts

Ask HN: How to learn CUDA to professional level

6 projects | news.ycombinator.com | 8 Jun 2025
AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs

1 project | news.ycombinator.com | 2 Jan 2025
AdaptiveCpp: Implementation of SYCL and C++ CPUs and GPUs

1 project | news.ycombinator.com | 20 Dec 2024
AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number

1 project | news.ycombinator.com | 25 Oct 2024
Gimps Discovers Largest Known Prime Number: 2^136279841 – 1

1 project | news.ycombinator.com | 21 Oct 2024
New Mersenne Prime discovered (probably)

1 project | news.ycombinator.com | 19 Oct 2024
AdaptiveCpp – SYCL implementation to run C++ on CPUs and GPUs

1 project | news.ycombinator.com | 24 Jul 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Sep 2025

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source Gpgpu projects in C++? This list will help you:

#	Project	Stars
1	ArrayFire	4,768
2	FluidX3D	4,631
3	SHADERed	4,537
4	kompute	2,319
5	AdaptiveCpp	1,685
6	Boost.Compute	1,624
7	MatX	1,349
8	compute-runtime	1,270
9	stdgpu	1,234
10	cuda-api-wrappers	856
11	amgcl	805
12	vulkan_minimal_compute	729
13	VexCL	715
14	occa	431
15	OpenCL-Wrapper	422
16	vuh	350
17	BabelStream	347
18	opencl-intercept-layer	340
19	RayTracing	336
20	OpenCL-Benchmark	242
21	gpuowl	203
22	GPUPrefixSums	161
23	ParallelReductionsBenchmark	105

C++ Gpgpu

Top 23 C++ Gpgpu Projects

C++ Gpgpu discussion

C++ Gpgpu related posts

Ask HN: How to learn CUDA to professional level

AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs

AdaptiveCpp: Implementation of SYCL and C++ CPUs and GPUs

AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number

Gimps Discovers Largest Known Prime Number: 2^136279841 – 1

New Mersenne Prime discovered (probably)

AdaptiveCpp – SYCL implementation to run C++ on CPUs and GPUs

Index

Did you know that C++ is the 7th most popular programming language based on number of references?

Did you know that C++ is
the 7th most popular programming language
based on number of references?