sse-popcount
libsimdpp
sse-popcount | libsimdpp | |
---|---|---|
2 | 1 | |
312 | 1,194 | |
- | - | |
5.6 | 0.0 | |
about 1 month ago | 5 months ago | |
C++ | C++ | |
BSD 2-clause "Simplified" License | Boost Software License 1.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sse-popcount
-
Fast bitset decoding using Intel AVX-512
https://developer.arm.com/documentation/ddi0596/2020-12/SIMD...
I believe it does 128 bits per instruction, but I'm still struggling with rust w/ asm.
Along my journeys, however, I found this repo https://github.com/WojciechMula/sse-popcount/ which has tons of competing simd implementations for both intel and arm.
-
Counting set bits in an interesting way
The builtin POPCNT that came with Intel's SSE4 (SSE4a for AMD) is much faster. However, at a certain point, using AVX2 (and AVX-512 if present) is actually faster yet [1] - at least for 512 byte inputs or larger.
[1]: https://github.com/WojciechMula/sse-popcount
libsimdpp
What are some alternatives?
toys - Storage for my snippets, toy programs, etc.
xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
Vc - SIMD Vector Classes for C++
VectorizedKernel - Running GPGPU-like kernels on CPU with auto-vectorization for SSE/AVX/AVX512 SIMD Architectures