-
It should be part of these discussions to proof what you claim. Always. With code samples, directly to the compiler and corresponding assembler.
https://godbolt.org/
Statistics are worthless alone, at the end all that counts is the arena of performance and what the code becomes and how it runs against the handcrafted version.
-
JetBrains
Tell us how you use coding tools. You may win a prize! Are you a developer or a data analyst? Share your thoughts about your coding tools in our short survey and get a chance to win prizes!
-
-
Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.
For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd
There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.
And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.
For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd
There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.
And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P
-
-
__builtin_shufflevector requires a known vector length, and can be pessimized (fusing two into one general all-to-all permute which is more expensive than two simple shuffles).
Also, vqsort (https://github.com/google/highway/tree/master/hwy/contrib/so...) almost entirely consists of