sd-extension-system-info
HIP
sd-extension-system-info | HIP | |
---|---|---|
51 | 29 | |
258 | 3,453 | |
- | 1.2% | |
9.3 | 8.9 | |
3 months ago | 6 days ago | |
Python | C++ | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sd-extension-system-info
- RTX 4070 vs rx 7800 xt
-
AMD for AI
I've been using both SD and various LLM on linux without any issue and have done so for months. Windows support is also starting to roll out slowly, with koboldcpp-rocm recently giving me 20-25+t/s for a13B even on windows. you can see what SD performance is like on sites like these. those numbers roughly match what i get on my RX6800 as well (8t/s).
-
Stable Diffusion in pure C/C++
That seems a lot worse than a 2060 SUPER with PyTorch in A1111.
https://vladmandic.github.io/sd-extension-system-info/pages/... (search for 2060 SUPER)
-
Iterations per second benchmarking question
But usually A1111 users use benchmark on this extension https://github.com/vladmandic/sd-extension-system-info
-
Best AMD SD Guide for 2023?
AMD SD = Setup Diaster? it was quite troublesome googling the few linux/amdgpu/rocm/sd vers/configs/params posts online. Also the whole PC may hang during generation which is bad for the harddisk. Your card is way more powerful so may not hang like mine. People are getting 8it/s https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
-
Which one is better? Nvidia Tesla M40 vs Nvidia Tesla P4?
According to system info benchmark, M40 is like 1-2 it/s and P4 is barely better than that.
- Video card price/performance ratio
-
--medvram. Should I remove this flag? Running 3090
Anyway to properly "benchmark" the impacts different switches on your image generation speed, it is better to use the benchmarking utility from extension https://github.com/vladmandic/sd-extension-system-info (it also creates a very handy table of results from other users at https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html for you to compare with.
-
Searching for install guide for top performance setup on WSL2 (Automatic1111)
I can see that the top performance benchmark results on SD WebUI Benchmark Data (using RTX 4090), are obtained through WSL2 running Automatic1111 on a Linux dist and Python 3.10.11, along with PyTorch 2.1.0.dev+cu121 (like benchmark id: 4126)
-
Advice for Optimization on an RTX 8000
You should be able to compare based on the published benchmarks, just replicate the settings based on what's reported https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
HIP
- Hip: Runtime API and Kernel Language for Portable Apps for AMD and Nvidia GPUs
-
Open-source project ZLUDA lets CUDA apps run on AMD GPUs
Is it perhaps because they want people to use HIP?
> HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
> The HIPIFY tools automatically convert source from CUDA to HIP.
1. https://github.com/ROCm/HIP
-
AMD's Next GPU Is a 3D-Integrated Superchip
AMD has released HIP and a tool called HIPIFY which kind of behaves like this but at the source level¹. Rather than try and just translate CUDA to work on AMD compute they are more focused on higher level tooling.
Currently they seem to have a particular focus on AI frameworks and tools like PyTorch/Tensorflow/ONNX. They have sponsored and helped with a lot of PyTorch development for example, so PyTorch support for AMD is much better than it was this time last year².
¹(https://github.com/ROCm/HIP)
²(https://pytorch.org/blog/experience-power-pytorch-2.0/)
-
Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
> what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA
It isn't just old cards though, CUDA is a point of centralization on a single provider during a time when access to that providers higher end cards isn't even available and that is causing people to look elsewhere.
ROCm supports CUDA through the included HIP projects...
https://github.com/ROCm/HIP
https://github.com/ROCm/HIPCC
https://github.com/ROCm/HIPIFY
The later will regex replace your CUDA methods with HIP methods. If it is as easy as running hipify on your codebase (or just coding to HIP apis), it certainly makes sense to do so.
-
Nvidia on the Mountaintop
AMD's equivalent is HIP [1], for sufficiently flexible definitions of "equivalent". I can't speak to how complete/correct/performant it is (I'm just a guy running tutorial/toy-level ML stuff on an RDNA1 card), but part of AMD's problem is that it might not practically matter how well they do this because the broader ecosystem support specifically for the CUDA stack is so entrenched.
[1] https://github.com/ROCm-Developer-Tools/HIP
- Stable Diffusion in pure C/C++
- Would love to hear your information and knowledge to simplify my understanding on AMD's positioning in the AI market
-
Ask HN: C++ still dominates on GPUs, why not Rust?
From what I know, modern GPUs are still programmed with C++ exclusively. See CUDA [0] for Nvidia and ROCm [1] for AMD.
Why is this? Why Rust is not loved there?
[0] https://docs.nvidia.com/cuda/
[1] https://github.com/ROCm-Developer-Tools/HIP
-
[P] RWKV C++ Cuda library with no dependencies, no torch, and no python
Go ahead and try to ship ROCm code that works on multiple consumer graphics cards on Linux, MacOS, and Windows. As an example of how much AMD cares about it, the installation notes linked to in the readme returns a 404.
-
Someone found a ROCm 5.5 RC Docker Container that works on 7000 series GPUs
The big whoop for ROCm is that AMD invested a considerable amount of engineering time and talent into a tool they call hip. Basically, it's an analysis tool that does its best to port proprietary Nvidia CUDA-style code - which due to various smelly reasons rules the roost - to code that can happily run on AMD graphics cards, and presumably others. Intel has a similar thing going with OneAPI. They've done this whilst working on porting a lot of their code base to the linux AMGPU open source kernel driver, as well.
What are some alternatives?
automatic - SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
AdaptiveCpp - Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
tomesd - Speed up Stable Diffusion with this one simple trick!
ZLUDA - CUDA on AMD GPUs
voltaML-fast-stable-diffusion - Beautiful and Easy to use Stable Diffusion WebUI
futhark - :boom::computer::boom: A data-parallel functional programming language
stable-diffusion-webui-directml - Stable Diffusion web UI
kompute - General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
scribble-diffusion - Turn your rough sketch into a refined image using AI
ginkgo - Numerical linear algebra software package
tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️
rocm-arch - A collection of Arch Linux PKGBUILDS for the ROCm platform