Ask HN: What is a AI chip and how does it work?

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gemm-benchmark

6 8 3.5 Rust

Simple [sd]gemm benchmark, similar to ACES dgemm

Apple Silicon Macs have special matrix multiplication units (AMX) that can do matrix multiplication fast and with low energy requirements [1]. These AMX units can often beat matrix multiplication on AMD/Intel CPUs (especially those without a very large number of cores). Since a lot of linear algebra code uses matrix multiplication and using the AMX units is only a matter of linking against Accelerate (for its BLAS interface), a lot of software that uses BLAS is faster o Apple Silicon Macs.
That said, the GPUs in your M1 Mac are faster than the AMX units and any reasonably modern NVIDIA GPU will wipe the floor with the AMX units or Apple Silicon GPUs in raw compute. However, a lot of software does not use CUDA by default and for small problem sets AMX units or CPUs with just AVX can be faster because they don't incur the cost of data transfers from main memory to GPU memory and vice versa.
[1] Benchmarks:
https://github.com/danieldk/gemm-benchmark#example-results
https://explosion.ai/blog/metal-performance-shaders (scroll down a bit for AMX and MPS numbers)

tensorflow

221 182,323 10.0 C++

An Open Source Machine Learning Framework for Everyone

This is indeed the bread-and-butter, but there is use of all sorts of standard linear algebra algorithms. You can check various xla-related (accelerated linear algebra) folders in tensorflow or torch folders in pytorch to see the list of what is used [1],[2]
[1] https://github.com/tensorflow/tensorflow/tree/8d9b35f442045b...
[2] https://github.com/pytorch/pytorch/blob/6e3e3dd477e0fb9768ee...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Pytorch

334 77,544 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

This is indeed the bread-and-butter, but there is use of all sorts of standard linear algebra algorithms. You can check various xla-related (accelerated linear algebra) folders in tensorflow or torch folders in pytorch to see the list of what is used [1],[2]
[1] https://github.com/tensorflow/tensorflow/tree/8d9b35f442045b...
[2] https://github.com/pytorch/pytorch/blob/6e3e3dd477e0fb9768ee...

rknn-toolkit

1 731 5.1 Python

Well, I already said it :P RK3588's NPU can only do convolutions and a small list of activation functions. Realistically it's usable only on image (2D, 3D data). The software side is pretty weird: It's amazing the amount of models they support despite the limited fixed-hardware function, but there has been literally 0 development in the last 6 months and it does have a lot of bugs that doesn't seem hard to fix. Even when it comes to images, it doesn't support changing the resolution of the input (you need to ""recompile the model"" for that), which is super weird since hardware pipeline doesn't care much about the size.
Anyway, I really don't recommend it, unless you're making your own model, you know before-hand what's supported and what isn't, and your input is fixed resolution (which is a pretty fair usage in an embedded system) (fixed = doesn't change at every frame. handling hotplug from one webcam with a resolution to another with another resolution is fine)
I think looking at the examples give you a reasonable show of what it can do: https://github.com/rockchip-linux/rknn-toolkit/tree/master/e...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project