Our great sponsors
-
optimization-manual
Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
src
Read-only git conversion of OpenBSD's official CVS src repository. Pull requests not accepted - send diffs to the tech@ mailing list.
This Rust issue [0] was the best short summary of what an SIMD Shuffle is I could find:
„A "shuffle", in SIMD terms, takes a SIMD vector (or possibly two vectors) and a pattern of source lane indexes (usually as an immediate), and then produces a new SIMD vector where the output is the source lane values in the pattern given.“
[0] https://github.com/rust-lang/portable-simd/issues/11
The Intel optimization manual has a fun example where they use vpconflict for vectorizing sparse dot products: https://github.com/intel/optimization-manual/blob/main/chap1...
I benchmarked it on Intel, and it was indeed quite fast/a good improvement over the scalar version. Will be interesting to try that on AMD.
Loading data from random memory locations became too expensive compared to computations. For log, exp, trigonometry, and similar, I think people rarely use any lookup tables. Instead, they use some high-poly approximations, and for log/exp abuse IEEE binary floats representation.
Here's a log() function from the standard library in OpenBSD: https://github.com/openbsd/src/blob/master/lib/libm/src/e_lo...
BTW, since you apparently working on the stuff like that, check out that repository:
https://github.com/Const-me/AvxMath/blob/master/AvxMath/AvxM...
The license is MIT, copy-paste friendly. It doesn’t use AVX512 though, only AVX1 and optionally 2.