C++ Gpgpu

Open-source C++ projects categorized as Gpgpu

Top 23 C++ Gpgpu Projects

  1. ArrayFire

    ArrayFire: a general purpose GPU library.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. SHADERed

    Lightweight, cross-platform & full-featured shader IDE

  4. FluidX3D

    The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

    Project mention: FluidX3D | news.ycombinator.com | 2024-12-07
  5. kompute

    General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.

    Project mention: Ask HN: How to learn CUDA to professional level | news.ycombinator.com | 2025-06-08
  6. AdaptiveCpp

    Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!

    Project mention: AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs | news.ycombinator.com | 2025-01-02
  7. Boost.Compute

    A C++ GPU Computing Library for OpenCL

  8. MatX

    An efficient C++17 GPU numerical computing library with Python-like syntax

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. compute-runtime

    Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

  11. stdgpu

    stdgpu: Efficient STL-like Data Structures on the GPU

  12. cuda-api-wrappers

    Thin C++-flavored header-only wrappers for core CUDA APIs: Runtime, Driver, NVRTC, NVTX.

    Project mention: Nvidia Security Team: "What if we just stopped using C?" (2022) | news.ycombinator.com | 2025-02-13

    > with the C++ API

    The funny thing is that the "C++ API" is almost entirely C-like, foregoing almost everything beneficial and convenient about C++, while at the same time not being properly limited to C.

    (which is why I wrote this: https://github.com/eyalroz/cuda-api-wrappers/ )

    > an awful GPU mailbox design is still the cutting-edge tech

    Can you elaborate on what you mean by a "mailbox design"?

  13. amgcl

    C++ library for solving large sparse linear systems with algebraic multigrid method

    Project mention: CuPy: NumPy and SciPy for GPU | news.ycombinator.com | 2024-09-20

    For my tasks, I had some success with algebraic multigrid solvers as preconditioner, for example from AMGCL or PyAMG. They are also reasonably easy to get started with.

    https://github.com/pyamg/pyamg

    https://github.com/ddemidov/amgcl

    But I only have to deal with positive definite systems, so YMMV.

    I am not sure whether those libraries can deal with multiple right-hand sides, but most complexity is in the preconditioners anyway.

  14. vulkan_minimal_compute

    Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.

  15. VexCL

    VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP

  16. occa

    Portable and vendor neutral framework for parallel programming on heterogeneous platforms.

  17. OpenCL-Wrapper

    OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

  18. vuh

    Vulkan compute for people

  19. BabelStream

    STREAM, for lots of devices written in many programming models

  20. opencl-intercept-layer

    Intercept Layer for Debugging and Analyzing OpenCL Applications

  21. RayTracing

    Realtime GPU Path tracer based on OpenCL and OpenGL (by AlexanderVeselov)

  22. OpenCL-Benchmark

    A small OpenCL benchmark program to measure peak GPU/CPU performance.

  23. gpuowl

    GPU Mersenne primality test.

    Project mention: AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number | news.ycombinator.com | 2024-10-25

    Hi, I'm Mihai Preda the author of GpuOwl/PRPLL [1], an OpenCL software that was used by Luke Durant for his recent discovery of the largest prime number know, the 52nd Mersenne prime 2^136279841 - 1 [2].

    Feel free to ask questions about technical aspects of the GpuOwl implementation, about optimizations, tricks, efficient FFT implementation on GPUs etc. Or anything else.

    [1] GpuOwl: https://github.com/preda/gpuowl

  24. ParallelReductionsBenchmark

    Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!

    Project mention: Memory-Level Parallelism: Apple M2 vs. Apple M4 | news.ycombinator.com | 2025-07-09

    It’s a very interesting benchmark (https://github.com/lemire/TestingMLP) — probably worth adding to the Phoronix suite.

    Every couple of years I refresh my own parallel reduction benchmarks (https://github.com/ashvardanian/ParallelReductionsBenchmark), which are also memory-bound. Mine mostly focus on the boring but necessary throughput-maximizing cases on CPUs and GPUs.

    Lately, as GPUs are pulled into more general data-processing tasks, I keep running into non-coalesced, pointer-chasing patterns — but I still don’t have a good mental model for estimating the cost of different access strategies. A crossover between these two topics — running MLP-style loads on GPUs — might be exactly the benchmark missing, in case someone is looking for a good weekend project!

  25. UE4_GPGPU_flocking

    Doing flocking/Boids in UE4 using GPGPU

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Gpgpu discussion

Log in or Post with

C++ Gpgpu related posts

  • Ask HN: How to learn CUDA to professional level

    6 projects | news.ycombinator.com | 8 Jun 2025
  • AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs

    1 project | news.ycombinator.com | 2 Jan 2025
  • AdaptiveCpp: Implementation of SYCL and C++ CPUs and GPUs

    1 project | news.ycombinator.com | 20 Dec 2024
  • AMA: GpuOwl/PRPLL, GPU software used to find the largest prime number

    1 project | news.ycombinator.com | 25 Oct 2024
  • Gimps Discovers Largest Known Prime Number: 2^136279841 – 1

    1 project | news.ycombinator.com | 21 Oct 2024
  • New Mersenne Prime discovered (probably)

    1 project | news.ycombinator.com | 19 Oct 2024
  • AdaptiveCpp – SYCL implementation to run C++ on CPUs and GPUs

    1 project | news.ycombinator.com | 24 Jul 2024
  • A note from our sponsor - Stream
    getstream.io | 14 Jul 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source Gpgpu projects in C++? This list will help you:

# Project Stars
1 ArrayFire 4,736
2 SHADERed 4,537
3 FluidX3D 4,509
4 kompute 2,258
5 AdaptiveCpp 1,663
6 Boost.Compute 1,615
7 MatX 1,338
8 compute-runtime 1,255
9 stdgpu 1,226
10 cuda-api-wrappers 846
11 amgcl 800
12 vulkan_minimal_compute 729
13 VexCL 714
14 occa 423
15 OpenCL-Wrapper 412
16 vuh 350
17 BabelStream 345
18 opencl-intercept-layer 332
19 RayTracing 325
20 OpenCL-Benchmark 230
21 gpuowl 197
22 ParallelReductionsBenchmark 99
23 UE4_GPGPU_flocking 80

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?