InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 C++ HPC Projects
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
FluidX3D
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
-
-
less_slow.cpp
Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
Thanks, appreciate the gesture :)
Traditional SWAR on GPUs is a fascinating topic. I've begun assembling a set of synthetic benchmarks to compare DP4A vs. DPX (<https://github.com/ashvardanian/less_slow.cpp/pull/35>), but it feels incomplete without SWAR. My working hypothesis is that 64-bit SWAR on properly aligned data could be very useful in GPGPU, though FMA/MIN/MAX operations in that PR might not be the clearest showcase of its strengths. Do you have a better example or use case in mind?
-
-
Project mention: Learning Assembly for Fun, Performance and Profit | news.ycombinator.com | 2025-04-12
So I would say skill at GPU assembly is in-demand for the elite tier of GPU performance work. Not necessarily writing much of it (though see [1] for an example, this is the kernel of multisplit as used in Nvidia's Onesweep implementation), but definitely in being able to read it so you can understand what the compiled code is actually doing. I'll also cite as evidence of that the incredible work of the engineers on Nanite. They describe writing the core of the microtriangle software renderer in HLSL but analyzing the assembler output to optimize down to the cycle level, as described in their "deep dive into Nanite virtualized geometry" talk (timestamp points to the reference to instruction-level micro-optimization).
[1]: https://github.com/NVIDIA/cccl/blob/2d1fa6bc9235106740d9373c...
[2]: https://www.youtube.com/watch?v=eviSykqSUUw&t=2073s
-
AdaptiveCpp
Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
Project mention: AdaptiveCpp – Implementation of SYCL and C++ Parallelism for CPUs and GPUs | news.ycombinator.com | 2025-01-02 -
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
Here is a bunch of simple examples: https://github.com/jfalcou/eve/blob/fb093a0553d25bb8114f1396...
I personally think we have the following strenghs:
* Algorithms. Writing SIMD loops is very hard. We give you a lot of ready to go loops. (find, search, remove, set_intersection to name a few).
-
-
-
-
-
-
-
-
-
-
-
qmcpack
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ HPC discussion
C++ HPC related posts
-
Show HN: Less Slow C++
-
ChipStar: Run CUDA/Hip on SPIR-V via OpenCL/Level Zero
-
An efficient C++17 GPU numerical computing library with Python-like syntax
-
MatX: Efficient C++17 GPU numerical computing library with Python-like syntax
-
Learn WebGPU
-
Standard way of doing maths with arrays?
-
Blaze: High Performance Mathematics In C++
-
A note from our sponsor - InfluxDB
www.influxdata.com | 14 May 2025
Index
What are some of the best open-source HPC projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | ArrayFire | 4,692 |
2 | FluidX3D | 4,414 |
3 | mfem | 1,885 |
4 | less_slow.cpp | 1,747 |
5 | VkFFT | 1,631 |
6 | cccl | 1,629 |
7 | AdaptiveCpp | 1,618 |
8 | Boost.Compute | 1,606 |
9 | MatX | 1,321 |
10 | Trilinos | 1,282 |
11 | eve | 1,188 |
12 | RaftLib | 965 |
13 | Fastor | 779 |
14 | oneMath | 672 |
15 | relion | 476 |
16 | ginkgo | 464 |
17 | occa | 418 |
18 | blitz | 413 |
19 | alpaka | 379 |
20 | Umpire | 357 |
21 | BabelStream | 335 |
22 | qmcpack | 334 |
23 | nekRS | 327 |