- ParallelReductionsBenchmark VS MatX
- ParallelReductionsBenchmark VS ispc
- ParallelReductionsBenchmark VS gpuowl
- ParallelReductionsBenchmark VS alpaka
- ParallelReductionsBenchmark VS cuda_memtest
- ParallelReductionsBenchmark VS amgcl
- ParallelReductionsBenchmark VS eaminer
- ParallelReductionsBenchmark VS relion
- ParallelReductionsBenchmark VS laser
- ParallelReductionsBenchmark VS vuda
ParallelReductionsBenchmark Alternatives
Similar projects and alternatives to ParallelReductionsBenchmark
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
eaminer
Heterogeneous Ethereum Miner with support for AMD, Intel and Nvidia GPUs using SYCL, OpenCL and CUDA backends
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers (by mratsim)
-
vuda
VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.
ParallelReductionsBenchmark reviews and mentions
-
Failing to Reach 204 GB/S DDR4 Bandwidth
For the single threaded version, they have a data hazard on the sums that could be smoothed out with a little loop unrolling and separate variables.
But in the [threaded version](https://github.com/unum-cloud/ParallelReductions/blob/fd16d9...) they have separate slots for an accumulator but it's still in a shared vector, which most likely has the issue I described.
Stats
The primary programming language of ParallelReductionsBenchmark is C++.