The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
Why do you think that https://github.com/ashvardanian/ParallelReductionsBenchmark is a good alternative to laser