despacer
mixbench
despacer | mixbench | |
---|---|---|
2 | 1 | |
147 | 339 | |
- | - | |
5.6 | 5.2 | |
5 months ago | 2 months ago | |
C | C++ | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
despacer
-
Removing characters from strings faster with AVX-512
Cool performance enhancement, with an accompanying implementation in a real-world library (https://github.com/lemire/despacer).
Still, what does it signal that vector extensions are required to get better string performance on x86? Wouldn't it be better if Intel invested their AVX transistor budget into simply making existing REPB prefixes a lot faster?
-
Intel Nukes Alder Lake's AVX-512 Support, Now Fuses It Off in Silicon
If you're looking for an example, perhaps the despacer problem might be one which doesn't get too complex. Do you know of a way to implement it on a GPU such that it'd run better than (or at least as good as) a CPU SIMD implementation would?
mixbench
-
Intel Nukes Alder Lake's AVX-512 Support, Now Fuses It Off in Silicon
The results I get match the FLOPS figures stated for the respective GPUs, so presumably I can't be memory bound or similar. But if you're still in doubt, I was using this code, comparing the single precision and integer kernels, so let me know any issues you see with the benchmark.
What are some alternatives?
rust - Empowering everyone to build reliable and efficient software.
eaminer - Heterogeneous Ethereum Miner with support for AMD, Intel and Nvidia GPUs using SYCL, OpenCL and CUDA backends
cglm - 📽 Highly Optimized 2D / 3D Graphics Math (glm) for C
AdaptiveCpp - Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
nsimd - Agenium Scale vectorization library for CPUs and GPUs
gtensor - GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.
arbor - The Arbor multi-compartment neural network simulation library.
BabelStream - STREAM, for lots of devices written in many programming models