Our great sponsors
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Good article, but note that if the hardware supports the division instruction, will be much faster than the described workarounds.
Personally, I recently did what’s written in 2 cases: FP32 division on ARMv7, and FP64 division on GPUs who don’t support that instruction.
For ARM CPUs, not only they have FRECPE, they also have FRECPS for the iteration step. An example there: https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di...
For GPUs, Microsoft classified FP64 division as “extended double shader instruction” and the support is optional. However, GPUs are guaranteed to support FP32 division. The result of FP32 division provides an awesome starting point for Newton-Raphson refinement in FP64 precision.