sd-extension-system-info
Vrmac
sd-extension-system-info | Vrmac | |
---|---|---|
51 | 45 | |
267 | 108 | |
- | - | |
6.7 | 3.6 | |
6 days ago | almost 3 years ago | |
Python | C# | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sd-extension-system-info
- RTX 4070 vs rx 7800 xt
-
AMD for AI
I've been using both SD and various LLM on linux without any issue and have done so for months. Windows support is also starting to roll out slowly, with koboldcpp-rocm recently giving me 20-25+t/s for a13B even on windows. you can see what SD performance is like on sites like these. those numbers roughly match what i get on my RX6800 as well (8t/s).
-
Stable Diffusion in pure C/C++
That seems a lot worse than a 2060 SUPER with PyTorch in A1111.
https://vladmandic.github.io/sd-extension-system-info/pages/... (search for 2060 SUPER)
-
Iterations per second benchmarking question
But usually A1111 users use benchmark on this extension https://github.com/vladmandic/sd-extension-system-info
-
Best AMD SD Guide for 2023?
AMD SD = Setup Diaster? it was quite troublesome googling the few linux/amdgpu/rocm/sd vers/configs/params posts online. Also the whole PC may hang during generation which is bad for the harddisk. Your card is way more powerful so may not hang like mine. People are getting 8it/s https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
-
Which one is better? Nvidia Tesla M40 vs Nvidia Tesla P4?
According to system info benchmark, M40 is like 1-2 it/s and P4 is barely better than that.
- Video card price/performance ratio
-
--medvram. Should I remove this flag? Running 3090
Anyway to properly "benchmark" the impacts different switches on your image generation speed, it is better to use the benchmarking utility from extension https://github.com/vladmandic/sd-extension-system-info (it also creates a very handy table of results from other users at https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html for you to compare with.
-
Searching for install guide for top performance setup on WSL2 (Automatic1111)
I can see that the top performance benchmark results on SD WebUI Benchmark Data (using RTX 4090), are obtained through WSL2 running Automatic1111 on a Linux dist and Python 3.10.11, along with PyTorch 2.1.0.dev+cu121 (like benchmark id: 4126)
-
Advice for Optimization on an RTX 8000
You should be able to compare based on the published benchmarks, just replicate the settings based on what's reported https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
Vrmac
-
New Renderers for GTK
Couple times in the past I have implemented GPU-targeted GUI renderers, here’s an example: https://github.com/Const-me/Vrmac?tab=readme-ov-file#vector-... https://github.com/Const-me/Vrmac/blob/master/Vrmac/Draw/VAA...
2D graphics have very little in common with game engines. The problem is very different in many regards. In 2D, you generally have Bezier and other splines on input, large amount of overdraw, textures coming from users complicate VRAM memory management. OTOH, game engines are solving hard problem which are irrelevant to 2D renderers, like dynamic lighting, volumetric effects, and dynamic environment.
-
Was Rust Worth It?
> Part of Panama
Most real-live C APIs are using function pointers and/or complicated data structures. Here’s couple real-life examples defined by Linux kernel developers who made V4L2 API: [0], [1] The first of them contains a union in C version, i.e. different structures are at the same memory addresses. Note C# delivers the level of usability similar to C or C++: we simply define structures, and access these fields. Not sure this is gonna be easy in Java even after all these proposals arrive.
For a managed runtime, unmanaged interop is a huge feature which affects all levels of the stack: type system in the language for value types, GC to be able to temporarily pin objects passed to native code (making copies is prohibitively slow for use cases like video processing), code generator to convert managed delegates to C function pointers and vice versa, error handling to automatically convert between exceptions and integer status codes at the API boundary, and more. Gonna be very hard to add into the existing language like Java.
> "Vector API" JEP
That API is not good. They don’t expose hardware instructions, instead they have invented some platform-agnostic API and implemented graceful degradation.
This means the applicability is likely to be limited to pure vertical operations processing FP32 or FP64 numbers. The rest of the SIMD instructions are too different between architectures. A simple example in C++ is [2], see [3] for the context. That example is trivial to port to modern C#, but impossible to port to Java even after the proposed changes. The key part of the implementation is psadbw instruction, which is very specific to SSE2/AVX2 and these vector APIs don’t have an equivalent. Apart from reduction, other problematic operations are shuffles, saturating integer math, and some memory access patterns (gathers in AVX2, transposed loads/stores on NEON).
> most of these are not done / not in a stable LTS Java release yet
BTW, SIMD intrinsics arrived to C# in 2019 (.NET Core 3.0 released in 2019), and unmanaged interop support is available since the very first 1.0 version.
[0] https://github.com/Const-me/Vrmac/blob/master/VrmacVideo/Lin...
[1] https://github.com/Const-me/Vrmac/blob/master/VrmacVideo/Lin...
[2] https://gist.github.com/Const-me/3ade77faad47f0fbb0538965ae7...
[3] https://news.ycombinator.com/item?id=36618344
-
Stable Diffusion in pure C/C++
I have minimal experience with Rust. OTOH, programming C++ for living since 2000, with a few gaps when I used other languages like Obj-C and C#.
I agree C++ is very hard to learn if you only have experience with higher-level languages like Python and Scala. I think there’re two reasons for that.
C++ is unsafe. There’s no way around this one, it was designed that way, like C or assembly. Still, with modern toolset it’s not terribly bad. Compilers print warnings, BTW I typically ask them to treat warnings as errors to deliberately fail the build. On Windows, a combination of debug build, debug C runtime, and visual studio debugger helps tremendously. Linux compilers have these sanitizers (address, memory, thread, undefined behavior) which are comparable, they too sacrifice runtime speed for diagnostics and debuggability.
Another reason, the language itself is very complicated, especially the templates. However, just because something is in the language doesn’t mean it’s a good idea to use it. You don’t need to be familiar with that stuff unless doing something very advanced, like customizing the Eigen C++ library. Don’t follow the patterns found in the standard library: unlike your code, that library has good reasons to use that template BS. If instead of templates you do something else, C++ becomes much easier to use, and most importantly other people will still be able to read and understand your code. Another reason to avoid excessive template metaprogramming, it slows down the compiler, because template-heavy code often needs to be in headers as opposed to cpp files.
P.S. If you don’t need extreme levels of performance (defined as “approach the numbers listed in CPU specs”, the numbers are FLOPS or memory bandwidth), and you don’t need the ecosystem too much, consider C# instead of C++. Much faster than Python, often faster than Scala or Java, easy integration with C should you need that (same as Rust, much easier than Python or Java), the only downside is these ~100MB of the runtime. The reputation is weird, but technically the language and runtime are pretty good. For example, here’s a C# library which re-implements a subset of ffmpeg and libavcodec C libraries: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo
-
Media Player Element now available for cross-platform apps everywhere dotnet runs
BTW, I did that too for 32-bit ARM Linux on Raspberry Pi 4, back in 2020: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo Unlike Uno, my implementation doesn’t use libVLC and is written mostly in C#, only audio decoders are in C++. To decode video, I directly consume V4L2 Linux kernel APIs.
-
Ask HN: Those making $0/month or less on side projects – Show and tell
Doing that for decades.
An app for Windows phone, downloaded 140k times: https://github.com/Const-me/SkyFM
Cross-platform graphics library for .NET: https://github.com/Const-me/Vrmac
Recently, offline speech-to-text for Windows: https://github.com/Const-me/Whisper
At this point, I consider side projects like that as a hobby.
-
Minimal Cross-Platform Graphics
I think this needs much more complexity to be useful.
For the rendering, ideally it needs GPU support.
Input needs much more work, here's an overview for Windows: https://zserge.com/posts/fenster/
Windows' Sleep() function has default resolution 15.6ms, that's not enough for realtime rendering, and relatively hard to fix, ideally need a modern OS and a waitable timer created with high resolution flag.
Here's my attempt at making something similar, couple years ago: https://github.com/Const-me/Vrmac
- An MP4 file first draft
-
Cppfront, Herb Sutter's proposal for a new C++ syntax
I agree about Python or PHP.
However, for Java or modern C#, in my experience the performance is often fairly close. When using either of them, very often one doesn’t need C++ to be good enough.
Here’s an example, a video player library for Raspberry Pi4: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo As written on that page, just a few things are in C++ (GLES integration, audio decoders, and couple SIMD utility functions), the majority of things are in C#.
-
Vulkan update: version 1.2 conformance for Raspberry Pi 4
To be fair, in modern GL versions they fixed some of these things. In GLES 3.1 which I used a lot on Pi4 https://github.com/Const-me/Vrmac/ GPU vertex buffers and shaders worked fine, GLSL compiler in the drivers worked fine too.
However, others issues are still present. There’s no shaders bytecode, they have an extension to grab compiled shaders from GPU driver to cache on disk, but it doesn’t work. The only way to create shaders is separate compile and link API calls. Texture loading and binding API is still less than ideal.
- Advice for the next dozen Rust GUIs
What are some alternatives?
automatic - SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
neutralinojs - Portable and lightweight cross-platform desktop application development framework
tomesd - Speed up Stable Diffusion with this one simple trick!
nanovg - Antialiased 2D vector drawing library on top of OpenGL for UI and visualizations.
voltaML-fast-stable-diffusion - Beautiful and Easy to use Stable Diffusion WebUI
vello - An experimental GPU compute-centric 2D renderer.
stable-diffusion-webui-amdgpu - Stable Diffusion web UI
rapidyaml - Rapid YAML - a library to parse and emit YAML, and do it fast.
scribble-diffusion - Turn your rough sketch into a refined image using AI
sokol - minimal cross-platform standalone C headers
HIP - HIP: C++ Heterogeneous-Compute Interface for Portability
NanoGUI - Minimalistic GUI library for OpenGL