SIMD for C++ Developers [pdf]

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

  • Nice writeup with helpful diagrams, thanks for sharing!

    Readers might also find this short intro [1] helpful, including tips on porting. (Disclosure: author)

    1: https://github.com/google/highway/blob/master/g3doc/highway_...

    > many available instructions are missing from the wrappers

    Highway can interop with platform-specific intrinsics (on x86/ARM, hwy_vec.raw is the native intrinsic type).

    > vectorized integer math often treats vectors as having different lanes count on every line of code

    Fair point, that's a cost of type safety. We usually write `auto` to avoid spelling it out.

  • CppSPMD_Fast

    Optimized CppSPMD test project: macro control flow, SSE4.1/AVX1/AVX2/AVX2 FMA support

  • Switching compilers is often too high-risk, but there are header-only libraries that get you most of the same benefits with normal C++ and wrappers around the intrinsics: https://github.com/richgel999/CppSPMD_Fast

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DirectXMath

    DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

  • For videogame applications, look there before writing these intrinsics: https://github.com/microsoft/DirectXMath/ That library already implements a lot of complicated things, relatively well.

    Here’s for frustum culling https://github.com/microsoft/DirectXMath/blob/jan2021/Inc/Di... Relatively inefficient when you have many boxes to test against same frustum, but (a) compiler may inline and optimize (b) failing that, it’s easy to copy-paste and optimize manually, compute these 6 planes and call BoundingBox::ContainedBy method yourself.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts