Matrix Multiplication Inches Closer To Mythic Goal

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

abseil-cpp

54 13,878 9.5 C++

Abseil Common Libraries (C++)

This thought has been churning around in my mind for some years now — we focus too much on processing speed and reductions in time complexity that and not enough on increasing the size and efficiency of our cache and stack.
MM (especially MM on large type numbers like e.g. hashing algorithms) are very reliant on the cache because you can’t always fit that big of a number into a register. Side note, I was reading some Abseil code last night that did some funky but twiddling on ARM: https://github.com/abseil/abseil-cpp/blob/master/absl/hash/i...
Off the top of my head, isn’t it about 200ms to query, bus, and read something from memory? Just a thought, perhaps the cache and memory is where we should focus our efforts.
blis

16 2,073 7.1 C

BLAS-like Library Instantiation Software Framework

However, on recent CPUs 4x4 is small for the innermost block size of the non-trivial hierarchy you need. You can see examples under https://github.com/flame/blis/tree/master/config with an a priori procedure for determining them in https://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analyti... (but compare with what's actually used for SKX, in particular). OpenBLAS will normally be similar, though it may come out somewhat faster, but it's easier to see in BLIS.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project