volk
sliceslice-rs
volk | sliceslice-rs | |
---|---|---|
2 | 2 | |
512 | 87 | |
1.4% | - | |
8.9 | 5.9 | |
about 1 month ago | 3 months ago | |
C++ | Rust | |
GNU Lesser General Public License v3.0 only | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
volk
-
RISC-V Business: Testing StarFive's VisionFive 2 SBC
I wonder how much of the performance will improve when compilers get better at RISC-V.
It's been a long time since I could beat the compiler at optimizing assembly on x86, yet in the end merely unrolling a loop and keeping an eye on write-read stalls I managed to get a simple "multiply array by const" about 56% faster:
https://github.com/gnuradio/volk/pull/619
And that's with hardware that doesn't even have vector instructions! I'd understand GCC not supporting that yet.
Some other quickstart docs and hot takes from me on this hardware: https://blog.habets.se/2023/01/VisionFive-2-quickstart.html
- AVX/AVX-512 Tuning Doesn't Payoff for LibreOffice's Calc Spreadsheets
sliceslice-rs
-
Memchr 2.4 now has an implementation of substring search on arbitrary bytes
Aside from that, their SIMD implementation is better optimized than the one I wrote. Aside from the codegen problem I talked about on that PR, sliceslice does better with its confirmation step by specializing calls to memcmp for all needles up to length 16. This repeats the entire implementation 16 times or so (for each of SSE2 and AVX2, so 32 in total I believe), but lets the memcmp call be a bit better than a generic one. We could do the same in memchr, but I wanted to see how much mileage we could get with fewer copies of the code and a lower latency implementation of memcmp.
What are some alternatives?
xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
regex-automata - A low level regular expression library that uses deterministic finite automata.
riscv-profiles - RISC-V Architecture Profiles
nsimd - Agenium Scale vectorization library for CPUs and GPUs
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
linux-on-litex-vexriscv - Linux on LiteX-VexRiscv
rust-memchr - Optimized string search routines for Rust.
GLM - OpenGL Mathematics (GLM)
ripgrep - ripgrep recursively searches directories for a regex pattern while respecting your gitignore
Vc - SIMD Vector Classes for C++