HIPCC
RyzenAI-SW | HIPCC | |
---|---|---|
10 | 2 | |
236 | 38 | |
12.7% | - | |
6.4 | 5.8 | |
23 days ago | 17 days ago | |
C++ | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
RyzenAI-SW
-
AMD unveils Ryzen Pro 8000-series processors
In the benchmark you have linked, you clearly see that the performance of the CPU only implementation and the NPU implementation are identical.
https://github.com/amd/RyzenAI-SW/blob/main/example/transfor...
What this should tell you is that "15 TOPs" is an irrelevant number in this benchmark. There are exactly two FLOPs per parameter. Loading the parameters takes more time than processing them.
There are people with less than 8GB of VRAM and they can't load these models into their GPU and end up with the exact same performance as on CPU. The 12tflops of the 3060 Ti 8GB are "no good" for LLMs, because the bottleneck for token generation is memory bandwidth.
My Ryzen 2700 gets 7 tokens per second at 50 GFLOPs. What does this tell you? The NPU can saturate the memory bandwidth of the system.
Now here is the gotcha: Have you tried inputting very large prompts? Because that is where the speedup is going to be extremely noticeable. Instead of waiting minutes on a 2000 token prompt, it will be just as fast as on GPUs, because the initial prompt processing is compute bound.
Also, before calling something subpar, you're going to have to tell me how you are going to put larger models like Goliath 70b or 120b models on your GPU.
-
AMD Unveils Ryzen 8000G Series Processors: Zen 4 APUs for Desktop with Ryzen AI
Unfortunately, Ryzen AI, while neat, remains Windows-exclusive
https://github.com/amd/RyzenAI-SW/issues/2
- AMD announces Ryzen 8045HS, 8040HS and 8040U "Hawk Point" series powered by Zen4, RDNA3 and XDNA - VideoCardz.com
- AMD Wants To Know If You'd Like Ryzen AI Support On Linux - Please upvote here to have a AMD AI Linux driver
- AMD wants to know if you would like Ryzen AI support for Linux
-
AMD Wants to Know If You'd Like Ryzen AI Support on Linux
I mean, it barely even exists or seems to be acknowledged in Windows either. Somehow, they made it into a feature that OEMs can decide to disable , even with compatible CPUs. This issue (and the complete lack of info, save from the input of a very helpful employee) to me shows the state of ryzenai:
https://github.com/amd/RyzenAI-SW/issues/5#issuecomment-1726...
-
Ryzen AI in 7940HS
Searching around showed other users with same issue and I found in the Ryzen AI footnote that this feature depends on OEM to be enabled. Github Issue
-
PCWorld: "Why AMD thinks Ryzen AI will be just as vital as CPUs and GPUs"
I noticed AMD just posted this github: https://github.com/amd/RyzenAI-cloud-to-client-demo
HIPCC
-
AMD Unveils Ryzen 8000G Series Processors: Zen 4 APUs for Desktop with Ryzen AI
Not sure if I completely understand what "Ryzen AI" does, but Tinygrad for example has some limited support for RDNA3[0]. It isn't quite there yet in matters of performance though, as you can read in the comments of that file.
There's also a small tutorial by AMD on how to use the WMMA intrinsic[1] using AMD's hipcc[2] compiler. Documentation is sparse kinda sparse, but the instruction set is not huge. The RDNA3 ISA guide[3] might also be helpful (and only a fraction of the pages are relevant.)
0. https://github.com/tinygrad/tinygrad/blob/master/extra/gemm/...
1. https://gpuopen.com/learn/wmma_on_rdna3/
2. https://github.com/ROCm/HIPCC
3. https://www.amd.com/content/dam/amd/en/documents/radeon-tech...
-
Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
> what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA
It isn't just old cards though, CUDA is a point of centralization on a single provider during a time when access to that providers higher end cards isn't even available and that is causing people to look elsewhere.
ROCm supports CUDA through the included HIP projects...
https://github.com/ROCm/HIP
https://github.com/ROCm/HIPCC
https://github.com/ROCm/HIPIFY
The later will regex replace your CUDA methods with HIP methods. If it is as easy as running hipify on your codebase (or just coding to HIP apis), it certainly makes sense to do so.
What are some alternatives?
Cgml - GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.