WebAssembly Techniques to Speed Up Matrix Multiplication by 120x

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • gemm-benchmark

    Simple [sd]gemm benchmark, similar to ACES dgemm

  • There's always been a tradeoff in writing code between developer experience and taking full advantage of what the hardware is capable of. That "waste" in execution efficiency is often worth it for the sake of representing helpful abstractions and generally helping developer productivity.

    The GFLOP/s is 1/28th of what you'd get when using the native Accelerate framework on M1 Macs [1]. I am all in for powerful abstractions, but not using native APIs for this (even if it's just the browser calling Accelerate in some way) is just a huge waste of everyone's CPU cycles and electricity.

    [1] https://github.com/danieldk/gemm-benchmark#1-to-16-threads

  • wasmblr

    C++ WebAssembly assembler in a single header file

  • That's a good point: you certainly could. There's some fun exploration to be done with atomic operations.

    The issue is that threaded execution requires cross-origin isolation, which isn't trivial to integrate. (Example server that will serve the required headers: https://github.com/bwasti/wasmblr/blob/main/thread_example/s...)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • XNNPACK

    High-efficiency floating-point neural network inference operators for mobile, server, and Web

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Xnnpack: High-efficiency floating-point neural network inference operators

    1 project | news.ycombinator.com | 25 Dec 2023
  • Can a NPU be used for vectors?

    1 project | /r/RISCV | 29 Aug 2023
  • [Discussion] Is XNNPACK a part of mediapipe? or should be additionally configured with mediapipe?

    1 project | /r/opencv | 29 Jan 2022
  • Where are Nvidia's DLSS models stored and how big are they?

    1 project | /r/hardware | 28 Mar 2021
  • Performance critical ML: How viable is Rust as an alternative to C++

    4 projects | /r/rust | 2 May 2023