simdutf
riscv-v-spec
simdutf | riscv-v-spec | |
---|---|---|
11 | 43 | |
960 | 858 | |
4.4% | - | |
9.1 | 6.0 | |
3 days ago | about 2 months ago | |
C++ | Assembly | |
Apache License 2.0 | Creative Commons Attribution 4.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
simdutf
- Glibc Buffer Overflow in Iconv
-
Vectorizing Unicode conversions on real RISC-V hardware
The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
-
Cray-1 performance vs. modern CPUs
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
# codegen for equivalent of that function
-
What C++ library do you wish existed but hasn’t been created yet?
utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
-
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
-
What's everyone working on this week (10/2023)?
The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
-
Why would a language not natively support SIMD?
You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
- High speed Unicode routines using SIMD
-
text-2.0-rc1 with UTF8 underlying representation is available for testing!
Or via an ultrafast simdutf.
- Simdutf: Unicode validation and transcoding at billions of characters per second
riscv-v-spec
-
Scaleway launches RISC-V servers
Here are some resources I can recommend:
RVV spec (also look at the examples in the repo): https://github.com/riscv/riscv-v-spec/blob/master/v-spec.ado...
RVV intrinsics viewer: https://dzaima.github.io/intrinsics-viewer
Tutorial: RISC-V Vector Extension Demystified (3 hour video going over every instruction): https://youtu.be/oTaOd8qr53U
RISC-V Vector extension in a nutshell: https://fprox.substack.com/p/risc-v-vector-extension-in-a-nu...
If you want to see a more complex example/real world application, then you might also be ibterested ib my article about vectorizing unicode conversions: https://camel-cdr.github.io/rvv-bench-results/articles/vecto...
In terms of development I'd recommend using qemu and a cross compiler, or if you want hardware try to get the kendryte k230 (currently the only sbc with rvv 1.0 support) or wait a bit for better hardware (BPI-F3 and sg2380 should release this year).
- Cray-1 performance vs. modern CPUs
-
x86 vs ARM; Vector and Matrix Extensions; How do they compare?
And this isn't just some theoretical or something unlikely to happen - the official spec already contains such a bug. If the writers of the spec can't get things right, even with the small amount of code in the spec, I don't have high hopes that less informed programmers will. RVV being absurdly complicated (IMO, compared to SVE2 and AVX10) doesn't help its cause here.
- riscv64 is now an official Debian architecture (rebootstrap in progress)
- Vector vs SIMD
-
LLVM's libc Gets Much Faster memcpy For RISC-V
Will the reference one actually be the most optimal one on future hardware?
- Is there any good place to find a copy-paste-able quick reference on RISC-V extensions? Particularly for the vector extension
-
Building a toolchain suitable for compiling V extension code
I'll do a deep dive into the https://gms.tf/riscv-vector.html#getting-started tutorial, and probably pop the proverbial stack and just study RVV 0.7.1 on its own (using https://github.com/riscv/riscv-v-spec/releases/tag/0.7.1).
-
A weird idea for using RV32E on a RV32I core - multithreaded microcontrollers?
I see your point. You can file a request for it at https://github.com/riscv/riscv-v-spec/issues if you want to pitch it to the relevant ISA bodies. The bar for implementing it pretty high.
-
Examining the Top Five Fallacies About RISC-V
It's not "unusual"; using data registers for mask is a valid tradeoff especially for low-end implementations, whereas higher-end architectures can easily use shadow registers. Discussed in depth at https://github.com/riscv/riscv-v-spec/issues/811
What are some alternatives?
simdutf8 - SIMD-accelerated UTF-8 validation for Rust.
riscv-p-spec - RISC-V Packed SIMD Extension
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
highway - Highway - A Modern Javascript Transitions Manager
eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr
riscv-bitmanip - Working draft of the proposed RISC-V Bitmanipulation extension
Vc - SIMD Vector Classes for C++
vroom - VRoom! RISC-V CPU
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
learn-fpga - Learning FPGA, yosys, nextpnr, and RISC-V