Emulating AMD Approximate Arithmetic Instructions on Intel

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • embree-aarch64

    Discontinued AARCH64 port of Embree ray tracing library

  • Yikes.

    A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-raphson update to compute a "precise" 1/sqrt(x). Here's a good example of NEON's rsqrt being sufficiently different from Intel, that more iterations were necessary for Embree on ARM [1].

    Because I only cared about vectorization a long time ago, and AMD was so uncompetitive then, I'd bet a lot of code assumes that the SSE rsqrtps values match.

    [1] https://github.com/lighttransport/embree-aarch64/issues/20

  • eigen

  • (Too late for edit)

    Looks like Eigen also defaults to EIGEN_FAST_MATH which makes Eigen's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of _mm256_sqrt_ps [1].

    Interestingly, the thing they're trying to avoid (long latency of sqrt vs rsqrt) hasn't been true for a long time on Intel processors, but apparently is still true for AMD parts according to Agner Fog's tables [2] (though maybe I'm reading them wrong, there is no vsqrtps entry for Zen2/3).

    [1] https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...

    [2] https://agner.org/optimize/instruction_tables.pdf

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • mu

    Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society. (by akkartik)

  • I wrote something up when I ran into these instructions last year: https://github.com/akkartik/mu/blob/main/linux/x86_approx.md

  • math_routines

  • I investigated the differences between the rsqrt and rcp instructions on Intel and AMD platforms back in 2016, and drafted a note with my findings. See the file rsqrt_rcp/docs/rsqrt_rcp.pdf in the git repository https://github.com/jeff-arnold/math_routines.

    Some conclusions:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts