InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 C++ parallel-computing Projects
-
Project mention: Show HN: Coros – A Modern C++ Library for Task Parallelism | news.ycombinator.com | 2024-09-25
Martin, have you had a look at https://github.com/taskflow/taskflow ?
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Thanks for the added context on the builds! As "foreign" BW player and fellow speech processing researcher, I agree shallow contextual biasing should help. While not difficult to implement, most generally available ASR solutions don't make it easy to use. There's a PR in ctranslate2 implementing the same feature so that it could be exposed in faster-whisper: https://github.com/OpenNMT/CTranslate2/pull/1789
-
kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
-
-
Project mention: Learning Assembly for Fun, Performance and Profit | news.ycombinator.com | 2025-04-12
So I would say skill at GPU assembly is in-demand for the elite tier of GPU performance work. Not necessarily writing much of it (though see [1] for an example, this is the kernel of multisplit as used in Nvidia's Onesweep implementation), but definitely in being able to read it so you can understand what the compiled code is actually doing. I'll also cite as evidence of that the incredible work of the engineers on Nanite. They describe writing the core of the microtriangle software renderer in HLSL but analyzing the assembler output to optimize down to the cycle level, as described in their "deep dive into Nanite virtualized geometry" talk (timestamp points to the reference to instruction-level micro-optimization).
[1]: https://github.com/NVIDIA/cccl/blob/2d1fa6bc9235106740d9373c...
[2]: https://www.youtube.com/watch?v=eviSykqSUUw&t=2073s
-
Project mention: Understanding SIMD: Infinite Complexity of Trivial Problems | news.ycombinator.com | 2024-11-30
I'm surprised no one has mentioned Vc. I found ispc clunky and not as performant, and std::simd didn't support some useful math ops like rsqrt. Vc has been around for years, I have no trouble including it in my codes, it has masking and many of the most useful math ops, and I can get over 1 TF/s on a consumer-grade Ryzen and at least 3 TF/s on the big Epyc CPUs.
https://github.com/VcDevel/Vc
-
Kratos
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is written in C++ with extensive Python interface. (by KratosMultiphysics)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
libfork
A bleeding-edge, lock-free, wait-free, continuation-stealing tasking library built on C++20's coroutines
-
-
-
-
coros
An easy-to-use and fast library for task-based parallelism, utilizing coroutines. (by mtmucha)
Project mention: Show HN: Coros – A Modern C++ Library for Task Parallelism | news.ycombinator.com | 2024-09-25In your dequeue/circular buffer implementation, how is it able to grow the queue without locking?
The code seems to rely on atomics for head & tail, but grows the queue without any special provisions I can see.
https://github.com/mtmucha/coros/blob/ee30d3c1d0602c3071aa26...
-
-
-
areg-sdk
AREG is a cross-platform asynchronous Object RPC framework to simplify multitasking programming by blurring borders between processes and treating remote objects as if they coexist in the same thread.
-
-
ConcurrentDeque
Fast, generalized, implementation of the Chase-Lev lock-free work-stealing deque for C++17
-
-
Lazy
Light-weight header-only library for parallel function calls and continuations in C++ based on Eric Niebler's talk at CppCon 2019.
-
-
parallel-dfs-dag
A parallel implementation of DFS for Directed Acyclic Graphs (https://research.nvidia.com/publication/parallel-depth-first-search-directed-acyclic-graphs)
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
C++ parallel-computing discussion
C++ parallel-computing related posts
-
Show HN: Coros – A Modern C++ Library for Task Parallelism
-
rodin alternatives - mfem and FreeFem-sources
7 projects | 8 Mar 2023 -
Learn PDE constrained optimization
-
Open source FEA tools instead of ANSYS Workbench and APDL
-
Eighty Years of the Finite Element Method: Birth, Evolution, and Future
-
Fortran on GPU
-
Best Python package(s) to solve PDEs numerically?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 21 May 2025
Index
What are some of the best open-source parallel-computing projects in C++? This list will help you:
# | Project | Stars |
---|---|---|
1 | Taskflow | 10,848 |
2 | CTranslate2 | 3,799 |
3 | kokkos | 2,192 |
4 | mfem | 1,889 |
5 | cccl | 1,637 |
6 | Vc | 1,485 |
7 | Kratos | 1,107 |
8 | dolfinx | 895 |
9 | libfork | 700 |
10 | oneMath | 675 |
11 | RAJA | 519 |
12 | parlaylib | 363 |
13 | coros | 323 |
14 | feelpp | 320 |
15 | PothosCore | 312 |
16 | areg-sdk | 295 |
17 | CPURasterizer | 179 |
18 | ConcurrentDeque | 144 |
19 | cppRouting | 114 |
20 | Lazy | 112 |
21 | Bulk | 94 |
22 | parallel-dfs-dag | 50 |
23 | libGPGPU | 11 |