Optimize sgemm on RISC-V platform

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

sgemm_riscv

2 9 6.8 C

This project records the process of optimizing SGEMM (single-precision floating point General Matrix Multiplication) on the riscv platform.

this sgemm progress is pretty impressive. i assume 'version 1' is one of the lower lines on the first graph? he has a graph lower down that shows it getting under 300 megaflops, closer to 50 megaflops for most problem sizes. getting up to 2 gigaflops is seriously impressive; that's 50% of the maximum theoretical possible throughput, which is really challenging on things that aren't vector supercomputers
probably worth archiving this link: https://github.com/Zhao-Dongyu/sgemm_riscv
i was astounded yesterday to find that gcc 12.2.0 fails at some basic optimizations on risc-v. (all of the following is with -Os)
this trivial subroutine
    int sumarray(int *a, int n)

julia

350 44,510 10.0 Julia

The Julia Programming Language

I don't believe there is any official documentation on this, but https://github.com/JuliaLang/julia/pull/49430 for example added prefetching to the marking phase of a GC which saw speedups on x86, but not on M1.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
neural-engine

20 1,861 5.1

Everything we actually know about the Apple Neural Engine (ANE)

yep. they have a neural engine that is separate from the CPU and GPU that does really fast matmuls https://github.com/hollance/neural-engine. it's basically completely undocumented.

amx

18 843 4.1 C

Apple AMX Instruction Set

I am talking about the matrix/vector coprocessor (AMX). You can find some reverse-engineered documentation here: https://github.com/corsix/amx
On M3 a singe matrix block can achieve ~ 1TFLOP on DGEMM, I assume it will be closer to 4TFLOPS for SGEMM. The Max variants have two such blocks. Didn't do precise benchmarking myself, but switching Python/R matrix libraries to use Apple's BLAS result in 5-6x perf improvement on matrix heavy code for me.

blis

17 2,091 7.0 C

BLAS-like Library Instantiation Software Framework

There is a recent update to the blis alternative to BLAS that includes a number of RISC-V performance optimizations.
https://github.com/flame/blis/pull/737

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Julia 1.10 Highlights
1 project | news.ycombinator.com | 27 Dec 2023
Eleven strategies for making reproducible research the norm
1 project | news.ycombinator.com | 25 Nov 2023
The Julia Programming Language
1 project | news.ycombinator.com | 29 Sep 2023
Why are there no ROS2 bindings for Julia(lang)?
1 project | /r/ROS | 13 Aug 2023
AskScience AMA Series: We've identified subsets of Long COVID by blood proteins, ask us anything!
1 project | /r/askscience | 5 Aug 2023

Optimize sgemm on RISC-V platform

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
HPC julia-language blis neural-engine Julia
Post date: 28 Feb 2024

sgemm_riscv

julia

WorkOS

neural-engine

amx

blis

InfluxDB

Related posts

Optimize sgemm on RISC-V platform

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com HPC julia-language blis neural-engine Julia Post date: 28 Feb 2024

sgemm_riscv

julia

WorkOS

neural-engine

amx

blis

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
HPC julia-language blis neural-engine Julia
Post date: 28 Feb 2024