sse-popcount
Simd
sse-popcount | Simd | |
---|---|---|
2 | 1 | |
312 | 1,979 | |
- | - | |
5.6 | 9.6 | |
about 1 month ago | 4 days ago | |
C++ | C++ | |
BSD 2-clause "Simplified" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sse-popcount
-
Fast bitset decoding using Intel AVX-512
https://developer.arm.com/documentation/ddi0596/2020-12/SIMD...
I believe it does 128 bits per instruction, but I'm still struggling with rust w/ asm.
Along my journeys, however, I found this repo https://github.com/WojciechMula/sse-popcount/ which has tons of competing simd implementations for both intel and arm.
-
Counting set bits in an interesting way
The builtin POPCNT that came with Intel's SSE4 (SSE4a for AMD) is much faster. However, at a certain point, using AVX2 (and AVX-512 if present) is actually faster yet [1] - at least for 512 byte inputs or larger.
[1]: https://github.com/WojciechMula/sse-popcount
Simd
-
The Case of the Missing SIMD Code
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
[4] https://gitlab.com/libeigen/eigen
[5] https://github.com/shibatch/sleef
What are some alternatives?
libsimdpp - Portable header-only C++ low level SIMD library
StringZilla - Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
toys - Storage for my snippets, toy programs, etc.
MIPP - MIPP is a portable wrapper for SIMD instructions written in C++11. It supports NEON, SSE, AVX, AVX-512 and SVE (length specific).
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
eigen
Vc - SIMD Vector Classes for C++
mace - MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
fpng-java - Java Wrapper for the fast, native FPNG Encoder
oneDNN - oneAPI Deep Neural Network Library (oneDNN)