- ParallelReductionsBenchmark VS laser
- ParallelReductionsBenchmark VS ispc
- ParallelReductionsBenchmark VS eaminer
- ParallelReductionsBenchmark VS alpaka
- ParallelReductionsBenchmark VS MatX
- ParallelReductionsBenchmark VS gpuowl
- ParallelReductionsBenchmark VS relion
- ParallelReductionsBenchmark VS cuda_memtest
- ParallelReductionsBenchmark VS vuda
- ParallelReductionsBenchmark VS vuh
ParallelReductionsBenchmark Alternatives
Similar projects and alternatives to ParallelReductionsBenchmark
-
laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers (by mratsim)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
eaminer
Heterogeneous Ethereum Miner with support for AMD, Intel and Nvidia GPUs using SYCL, OpenCL and CUDA backends
-
-
-
-
-
-
vuda
VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
-
-
ParallelReductionsBenchmark discussion
ParallelReductionsBenchmark reviews and mentions
-
Failing to Reach 204 GB/S DDR4 Bandwidth
For the single threaded version, they have a data hazard on the sums that could be smoothed out with a little loop unrolling and separate variables.
But in the [threaded version](https://github.com/unum-cloud/ParallelReductions/blob/fd16d9...) they have separate slots for an accumulator but it's still in a shared vector, which most likely has the issue I described.
Stats
The primary programming language of ParallelReductionsBenchmark is C++.