Optimizing compilers reload vector constants needlessly

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • compiler-explorer

    Run compilers interactively from your web browser and interact with the assembly

  • It should be part of these discussions to proof what you claim. Always. With code samples, directly to the compiler and corresponding assembler.

    https://godbolt.org/

    Statistics are worthless alone, at the end all that counts is the arena of performance and what the code becomes and how it runs against the handcrafted version.

  • OpenBLAS

    OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • std-simd

    std::experimental::simd for GCC [ISO/IEC TS 19570:2018]

  • Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.

    For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd

    There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.

    And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P

  • DirectXMath

    DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

  • Bad news. For SIMD there are not cross-platform intrinsics. Intel intrinsics map directly to SSE/AVX instructions and ARM intrinsics map directly to NEON instructions.

    For cross-platform, your best bet is probably https://github.com/VcDevel/std-simd

    There's https://eigen.tuxfamily.org/index.php?title=Main_Page But, it's tremendously complicated for anything other than large-scale linear algebra.

    And, there's https://github.com/microsoft/DirectXMath But, it has obvious biases :P

  • FFmpeg

    Mirror of https://git.ffmpeg.org/ffmpeg.git

  • highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

  • __builtin_shufflevector requires a known vector length, and can be pessimized (fusing two into one general all-to-all permute which is more expensive than two simple shuffles).

    Also, vqsort (https://github.com/google/highway/tree/master/hwy/contrib/so...) almost entirely consists of

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts