ParallelReductionsBenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast! (by ashvardanian)

ParallelReductionsBenchmark Alternatives

Similar projects and alternatives to ParallelReductionsBenchmark

ashvardanian
ParallelReductionsBenchmark
  1. laser

    The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers (by mratsim)

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. ispc

    IntelĀ® Implicit SPMD Program Compiler

  4. eaminer

    Heterogeneous Ethereum Miner with support for AMD, Intel and Nvidia GPUs using SYCL, OpenCL and CUDA backends

  5. alpaka

    Abstraction Library for Parallel Kernel Acceleration :llama: (by alpaka-group)

  6. MatX

    An efficient C++17 GPU numerical computing library with Python-like syntax

  7. gpuowl

    GPU Mersenne primality test.

  8. relion

    Image-processing software for cryo-electron microscopy

  9. cuda_memtest

    Fork of CUDA GPU memtest :eyeglasses:

  10. vuda

    VUDA is a header-only library based on Vulkan that provides a CUDA Runtime API interface for writing GPU-accelerated applications.

  11. numactl

    NUMA support for Linux

  12. vuh

    Vulkan compute for people

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better ParallelReductionsBenchmark alternative or higher similarity.

ParallelReductionsBenchmark discussion

Log in or Post with

ParallelReductionsBenchmark reviews and mentions

Posts with mentions or reviews of ParallelReductionsBenchmark. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-02-02.
  • Failing to Reach 204 GB/S DDR4 Bandwidth
    3 projects | news.ycombinator.com | 2 Feb 2022
    For the single threaded version, they have a data hazard on the sums that could be smoothed out with a little loop unrolling and separate variables.

    But in the [threaded version](https://github.com/unum-cloud/ParallelReductions/blob/fd16d9...) they have separate slots for an accumulator but it's still in a shared vector, which most likely has the issue I described.

Stats

Basic ParallelReductionsBenchmark repo stats
2
74
6.6
6 days ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that C++ is
the 7th most popular programming language
based on number of references?