WebAssembly Techniques to Speed Up Matrix Multiplication by 120x

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

gemm-benchmark

6 8 3.5 Rust

Simple [sd]gemm benchmark, similar to ACES dgemm

There's always been a tradeoff in writing code between developer experience and taking full advantage of what the hardware is capable of. That "waste" in execution efficiency is often worth it for the sake of representing helpful abstractions and generally helping developer productivity.
The GFLOP/s is 1/28th of what you'd get when using the native Accelerate framework on M1 Macs [1]. I am all in for powerful abstractions, but not using native APIs for this (even if it's just the browser calling Accelerate in some way) is just a huge waste of everyone's CPU cycles and electricity.
[1] https://github.com/danieldk/gemm-benchmark#1-to-16-threads

wasmblr

5 159 1.8 C++

C++ WebAssembly assembler in a single header file

That's a good point: you certainly could. There's some fun exploration to be done with atomic operations.
The issue is that threaded execution requires cross-origin isolation, which isn't trivial to integrate. (Example server that will serve the required headers: https://github.com/bwasti/wasmblr/blob/main/thread_example/s...)

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
XNNPACK

8 1,700 9.9 C

High-efficiency floating-point neural network inference operators for mobile, server, and Web

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Xnnpack: High-efficiency floating-point neural network inference operators

1 project | news.ycombinator.com | 25 Dec 2023
Can a NPU be used for vectors?

1 project | /r/RISCV | 29 Aug 2023
[Discussion] Is XNNPACK a part of mediapipe? or should be additionally configured with mediapipe?

1 project | /r/opencv | 29 Jan 2022
Where are Nvidia's DLSS models stored and how big are they?

1 project | /r/hardware | 28 Mar 2021
Performance critical ML: How viable is Rust as an alternative to C++

4 projects | /r/rust | 2 May 2023

WebAssembly Techniques to Speed Up Matrix Multiplication by 120x

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
neural-networks Inference inference-optimization Simd Cpu
Post date: 25 Jan 2022

gemm-benchmark

wasmblr

InfluxDB

XNNPACK

Related posts

Xnnpack: High-efficiency floating-point neural network inference operators

Can a NPU be used for vectors?

[Discussion] Is XNNPACK a part of mediapipe? or should be additionally configured with mediapipe?

Where are Nvidia's DLSS models stored and how big are they?

Performance critical ML: How viable is Rust as an alternative to C++

WebAssembly Techniques to Speed Up Matrix Multiplication by 120x

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com neural-networks Inference inference-optimization Simd Cpu Post date: 25 Jan 2022

gemm-benchmark

wasmblr

InfluxDB

XNNPACK

Related posts

Xnnpack: High-efficiency floating-point neural network inference operators

Can a NPU be used for vectors?

[Discussion] Is XNNPACK a part of mediapipe? or should be additionally configured with mediapipe?

Where are Nvidia's DLSS models stored and how big are they?

Performance critical ML: How viable is Rust as an alternative to C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
neural-networks Inference inference-optimization Simd Cpu
Post date: 25 Jan 2022