Emulating AMD Approximate Arithmetic Instructions on Intel

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

embree-aarch64

1 45 4.9 C++

Discontinued AARCH64 port of Embree ray tracing library

Yikes.
A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-raphson update to compute a "precise" 1/sqrt(x). Here's a good example of NEON's rsqrt being sufficiently different from Intel, that more iterations were necessary for Embree on ARM [1].
Because I only cared about vectorization a long time ago, and AMD was so uncompetitive then, I'd bet a lot of code assumes that the SSE rsqrtps values match.
[1] https://github.com/lighttransport/embree-aarch64/issues/20

eigen

19 - -

(Too late for edit)
Looks like Eigen also defaults to EIGEN_FAST_MATH which makes Eigen's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of _mm256_sqrt_ps [1].
Interestingly, the thing they're trying to avoid (long latency of sqrt vs rsqrt) hasn't been true for a long time on Intel processors, but apparently is still true for AMD parts according to Agner Fog's tables [2] (though maybe I'm reading them wrong, there is no vsqrtps entry for Zen2/3).
[1] https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...
[2] https://agner.org/optimize/instruction_tables.pdf

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mu

29 1,342 4.3 Assembly

Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society. (by akkartik)

I wrote something up when I ran into these instructions last year: https://github.com/akkartik/mu/blob/main/linux/x86_approx.md

math_routines

1 0 0.0 C++

I investigated the differences between the rsqrt and rcp instructions on Intel and AMD platforms back in 2016, and drafted a note with my findings. See the file rsqrt_rcp/docs/rsqrt_rcp.pdf in the git repository https://github.com/jeff-arnold/math_routines.
Some conclusions:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project