mixbench
despacer
mixbench | despacer | |
---|---|---|
1 | 2 | |
340 | 147 | |
- | - | |
5.2 | 5.6 | |
2 months ago | 5 months ago | |
C++ | C | |
GNU General Public License v3.0 only | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
mixbench
-
Intel Nukes Alder Lake's AVX-512 Support, Now Fuses It Off in Silicon
The results I get match the FLOPS figures stated for the respective GPUs, so presumably I can't be memory bound or similar. But if you're still in doubt, I was using this code, comparing the single precision and integer kernels, so let me know any issues you see with the benchmark.
despacer
-
Removing characters from strings faster with AVX-512
Cool performance enhancement, with an accompanying implementation in a real-world library (https://github.com/lemire/despacer).
Still, what does it signal that vector extensions are required to get better string performance on x86? Wouldn't it be better if Intel invested their AVX transistor budget into simply making existing REPB prefixes a lot faster?
-
Intel Nukes Alder Lake's AVX-512 Support, Now Fuses It Off in Silicon
If you're looking for an example, perhaps the despacer problem might be one which doesn't get too complex. Do you know of a way to implement it on a GPU such that it'd run better than (or at least as good as) a CPU SIMD implementation would?
What are some alternatives?
eaminer - Heterogeneous Ethereum Miner with support for AMD, Intel and Nvidia GPUs using SYCL, OpenCL and CUDA backends
rust - Empowering everyone to build reliable and efficient software.
AdaptiveCpp - Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
cglm - 📽 Highly Optimized 2D / 3D Graphics Math (glm) for C
gtensor - GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.
nsimd - Agenium Scale vectorization library for CPUs and GPUs
arbor - The Arbor multi-compartment neural network simulation library.
BabelStream - STREAM, for lots of devices written in many programming models