stablehlo
wgpu-mm
stablehlo | wgpu-mm | |
---|---|---|
5 | 1 | |
333 | 47 | |
4.2% | - | |
9.8 | 8.7 | |
about 17 hours ago | about 2 months ago | |
MLIR | WGSL | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
stablehlo
-
Nvidia H200 Tensor Core GPU
I am going to paste a cousin comment:
StableHLO[1] is an interesting project that might help AMD here:
> Our goal is to simplify and accelerate ML development by creating more interoperability between various ML frameworks (such as TensorFlow, JAX and PyTorch) and ML compilers (such as XLA and IREE).
From there, their goal would most likely be to work with XLA/OpenXLA teams on XLA[3] and IREE[2] to make RoCM a better backend.
[1] https://github.com/openxla/stablehlo
[2] https://github.com/openxla/iree
[3] https://www.tensorflow.org/xla
-
Chrome Ships WebGPU
Also see the recently introduced StableHLO and its serialization format: https://github.com/openxla/stablehlo/blob/main/docs/bytecode...
-
OpenXLA Is Available Now
If you mean StableHLO, then it has an MLIR dialect: https://github.com/openxla/stablehlo/blob/main/stablehlo/dia....
In the StableHLO spec, we are talking about this in more abstract terms - "StableHLO opset" - to be able to unambiguously reason about the semantics of StableHLO programs. However, in practice the StableHLO dialect is the primary implementation of the opset at the moment.
I wrote "primary implementation" because e.g. there is also ongoing work on adding StableHLO support to the TFLite flatbuffer schema: https://github.com/tensorflow/tensorflow/blob/master/tensorf.... Having an abstract notion of the StableHLO opset enables us to have a source of truth that all the implementations correspond to.
wgpu-mm
-
Chrome Ships WebGPU
This is very exciting! (I had suspected it would slip to 114)
WebGPU implementations are still pretty immature, but certainly enough to get started with. I've been implementing a Rust + WebGPU ML runtime for the past few months and have enjoyed writing WGSL.
I recently got a 250M parameter LLM running in the browser without much optimisation and it performs pretty well! (https://twitter.com/fleetwood___/status/1638469392794091520)
That said, matmuls are still pretty handicapped in the browser (especially considering the bounds checking enforced in the browser). From my benchmarking I've struggled to hit 50% of theoretical FLOPS, which is cut down to 30% when the bounds checking comes in. (Benchmarks here: https://github.com/FL33TW00D/wgpu-mm)
I look forward to accessing shader cores as they mentioned in the post.
What are some alternatives?
wonnx - A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web
SHA256-WebGPU - Implementation of sha256 in WGSL
wgpu-py - Next generation GPU API for Python
iree - A retargetable MLIR-based machine learning compiler and runtime toolkit.
pygfx - A python render engine running on wgpu.
SHARK - SHARK - High Performance Machine Learning Distribution
tfjs - A WebGL accelerated JavaScript library for training and deploying ML models.
glare-core - C++ code used in various Glare Tech Ltd products
webgpu-blas - Fast matrix-matrix multiplication on web browser using WebGPU
mach - zig game engine & graphics toolkit
web-stable-diffusion - Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.