simdutf
Vc
simdutf | Vc | |
---|---|---|
11 | 6 | |
960 | 1,420 | |
4.8% | 1.1% | |
9.1 | 6.1 | |
3 days ago | 3 months ago | |
C++ | C++ | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
simdutf
- Glibc Buffer Overflow in Iconv
-
Vectorizing Unicode conversions on real RISC-V hardware
The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
-
Cray-1 performance vs. modern CPUs
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
# codegen for equivalent of that function
-
What C++ library do you wish existed but hasn’t been created yet?
utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
-
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
-
What's everyone working on this week (10/2023)?
The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
-
Why would a language not natively support SIMD?
You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
- High speed Unicode routines using SIMD
-
text-2.0-rc1 with UTF8 underlying representation is available for testing!
Or via an ultrafast simdutf.
- Simdutf: Unicode validation and transcoding at billions of characters per second
Vc
-
The Bitter Truth: Python 3.11 vs Cython vs C++ Performance for Simulations
Most high-performance math libraries perform a lot of vectorization (Eigen, etc) under the hood. And you've got stuff like Klein, Vc (which is reminiscent of std::valarray), etc. Then there's OpenMP's #pragma omp simd (assuming version 4.0 or greater).
-
John "God" Carmack: C++ with a C flavor is still the best (also: Python performance "keeps hitting me in the face")
I personally like the ideas in Parallelism v2 TS, which is available in for libstdc++ 11 onwards. The reference implementation is a library named Vc (afaik Vc is the most popular SIMD library for C++), and this has also been implemented in recent versions of HPX.
-
SPO600 project part 2
First of all about our project, I previously decided to work with VC library.https://github.com/VcDevel/Vc
-
SPO600 project part 1
I've decided to switch to something better, and after a few hours of searching, I found this repository: NSIMD https://github.com/agenium-scale/nsimd FastDifferentialCoding https://github.com/lemire/FastDifferentialCoding VS https://github.com/VcDevel/Vc XSIMD https://github.com/xtensor-stack/xsimd
- Vc 1.4.2 released: portable SIMD programming for C++
-
All C++20 core language features with examples
> - Waiting for Cross-Platform standardized SIMD vector datatypes
which language has standardized SIMD vector datatypes ? most languages don't even have any ability to express SIMD while in C++ I can just use Vc (https://github.com/VcDevel/Vc), nsimd (https://github.com/agenium-scale/nsimd) or one of the other ton of alternatives, and have stuff that JustWorksTM on more architectures than most languages even support
- Using nonstandard extensions, libraries or home-baked solutions to run computations in parallel on many cores or on different processors than the CPU
what are the other native languages with a standardized memory model for atomics ? and, what's the problem with using libraries ? it's not like you're going to use C# or Java's built-in threadpools if you are doing any serious work, no ? Do they even have something as easy to use as https://github.com/taskflow/taskflow ?
- Debugging cross-platform code using couts, cerrs and printfs
because people never use console.log in JS or System.println in C# maybe ?
- Forced to use boost for even quite elementary operations on std::strings.
can you point to non-trivial java projects that do not use Apache Commons ? Also, the boost string algorithms are header-only so you will end up with exactly the same binaries that if it was in some std::string_algorithms namespace:
https://gcc.godbolt.org/z/43vKadbde
What are some alternatives?
simdutf8 - SIMD-accelerated UTF-8 validation for Rust.
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
Eigen
eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr
blaze
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
MIRACL - MIRACL Cryptographic SDK: Multiprecision Integer and Rational Arithmetic Cryptographic Library is a C software library that is widely regarded by developers as the gold standard open source SDK for elliptic curve cryptography (ECC).
colrcv - C Library for converting Colours between different Colour Models
GLM - OpenGL Mathematics (GLM)