simdutf
DirectXMath
simdutf | DirectXMath | |
---|---|---|
11 | 13 | |
960 | 1,481 | |
4.8% | 0.3% | |
9.1 | 6.6 | |
3 days ago | 30 days ago | |
C++ | C++ | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
simdutf
- Glibc Buffer Overflow in Iconv
-
Vectorizing Unicode conversions on real RISC-V hardware
The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
[0] https://github.com/simdutf/simdutf
-
Cray-1 performance vs. modern CPUs
I'm actually doing something quite similar in my, in progress, unicode conversion routines.
For utf8 validation there is a clever algorithm that uses three 4-bit look-ups to detect utf8 errors: https://github.com/simdutf/simdutf/blob/master/src/icelake/i...
Aside on LMUL, if you haven't encountered it yet: rvv allows you to group vector registers when configuring the vector configuration with vsetvl such that vector instruction operate on multiple vector registers at once. That is, with LMUL=1 you have v0,v1...v31. With LMUL=2 you effectively have v0,v2,...v30, where each vector register is twice as large. with LMUL=4 v0,v4,...v28, with LMUL=8 v0,v8,...v24.
In my code, I happen to read the data with LMUL=2. The trivial implementation would just call vrgather.vv with LMUL=2, but since we only need a lookup table with 128 bits, LMUL=1 would be enough to store the lookup table (V requires a minimum VLEN of 128 bits).
So instead I do six LMUL=1 vrgather.vv's instead of three LMUL=2 vrgather.vv's because there is no lane crossing required and this will run faster in hardware: (see [0] for a relevant mico benchmark)
# codegen for equivalent of that function
-
What C++ library do you wish existed but hasn’t been created yet?
utf8 normalization, stemming, case insensitive comparison. https://github.com/unicode-rs example for rust What are options for C++? 1. translate to utf16 ( https://github.com/simdutf/simdutf ) and use icu -- slow 2. boost text, https://github.com/tzlaine/text , also slow (because the author doesn't care or couldn't care), we made a lot of patches to make our library faster than lucene, but still this part is slower than icu for utf16 (icu for utf16 also very slow...)
-
[Preprint] Transcoding Unicode Characters with AVX-512 Instructions
You can find the corresponding assembly code in this repository. The main branch only contains implementations based on C++ with intrinsics.
-
What's everyone working on this week (10/2023)?
The next big thing is making it LSP-compatible. All language servers must implement UTF-16 based character offsets, which is kinda unfortunate considering that files are much more likely to be stored in UTF-8 (I think?). I don't want to do the UTF-8 -> UTF-16 transcoding, so instead I'll use the excellent simdutf library to count how much code points a UTF-8 string would take if it was transcoded into UTF-16 — which is much faster than actual transcoding. So this is what I'm going to do this week — rewriting parsers to produce UTF-16 offsets + some final benchmarking. After that is done, I'll consider the "research" part of this project completed and will start writing an actual Markdown parser.
-
Why would a language not natively support SIMD?
You can find the assembly code here: https://github.com/simdutf/simdutf/tree/clausecker The corresponding C++ code is in the main branch.
- High speed Unicode routines using SIMD
-
text-2.0-rc1 with UTF8 underlying representation is available for testing!
Or via an ultrafast simdutf.
- Simdutf: Unicode validation and transcoding at billions of characters per second
DirectXMath
-
Vector math library benchmarks (C++)
For those unfamiliar, like I was, DXM is DirectXMath.
-
Learning DirectX 12 in 2023
Alongside MiniEngine, you’ll want to look into the DirectX Toolkit. This is a set of utilities by Microsoft that simplify graphics and game development. It contains libraries like DirectXMesh for parsing and optimizing meshes for DX12, or DirectXMath which handles 3D math operations like the OpenGL library glm. It also has utilities for gamepad input or sprite fonts. You can see a list of the headers here to get an idea of the features. You’ll definitely want to include this in your project if you don’t want to think about a lot of these solved problems (and don’t have to worry about cross-platform support).
-
Optimizing compilers reload vector constants needlessly
Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.
For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd
There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.
And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P
-
MATHRIL - Custom math library for game programming
I am not in gamedev, but work with 3D graphics, we use DirectX 11, so DirectXMath was a natural choice, it is header only, it supports SIMD instructions (SSE, AVX, NEON etc.), it can even be used on Linux (has no dependence on Windows). It of course just one choice: https://github.com/Microsoft/DirectXMath.
- When i had to look up what a Quaternion is
-
Eigen: A C++ template library for linear algebra
I never really used GLM, but Eigen was substantially slower than DirectXMath https://github.com/microsoft/DirectXMath for these things. Despite the name, 99% of that library is OS agnostic, only a few small pieces (like projection matrix formula) are specific to Direct3D. When enabled with corresponding macros, inline functions from that library normally compile into pretty efficient manually vectorized SSE, AVX or NEON code.
The only major issue, DirectXMath doesn’t support FP64 precision.
-
maths - templated c++ linear algebra library with vector swizzling, intersection tests and useful functions for games and graphics dev... includes live webgl/wasm demo ?
If you’re the author, consider comparisons with the industry standards, glm and DirectXMath, which both ensure easy interoperability with the two graphics APIs.
-
Algorithms for division: Using Newton's method
Good article, but note that if the hardware supports the division instruction, will be much faster than the described workarounds.
Personally, I recently did what’s written in 2 cases: FP32 division on ARMv7, and FP64 division on GPUs who don’t support that instruction.
For ARM CPUs, not only they have FRECPE, they also have FRECPS for the iteration step. An example there: https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di...
For GPUs, Microsoft classified FP64 division as “extended double shader instruction” and the support is optional. However, GPUs are guaranteed to support FP32 division. The result of FP32 division provides an awesome starting point for Newton-Raphson refinement in FP64 precision.
-
Use of BLAS vs direct SIMD for linear algebra library operations?
For graphics DX math is a very good library.
-
Speeding Up `Atan2f` by 50x
I wonder how does it compare with Microsoft’s implementation, there: https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di...
Based on the code your version is probably much faster. It would be interesting to compare precision still, MS uses 17-degree polynomial there.
What are some alternatives?
simdutf8 - SIMD-accelerated UTF-8 validation for Rust.
GLM - OpenGL Mathematics (GLM)
simde - Implementations of SIMD instruction sets for systems which don't natively support them.
highway - Performance-portable, length-agnostic SIMD with runtime dispatch
eve - Expressive Vector Engine - SIMD in C++ Goes Brrrr
libjxl - JPEG XL image format reference implementation
Vc - SIMD Vector Classes for C++
Fastor - A lightweight high performance tensor algebra framework for modern C++
simdjson - Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
glibc - GNU Libc
colrcv - C Library for converting Colours between different Colour Models