HIP
Vrmac
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
HIP
-
Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP
>ROCm or HIP?
I'm not sure that's even the right question to ask. Afaik ROCm is the name of that entire tech stack and HIP is AMD's equivalent to CUDA C++ (they basically replicated the API and replaced every "CUDA" by "hip", they have functions called "hipmalloc" and "hipmemcpy").
The repository is located at https://github.com/ROCm/HIP.
- Hip: Runtime API and Kernel Language for Portable Apps for AMD and Nvidia GPUs
-
Open-source project ZLUDA lets CUDA apps run on AMD GPUs
Is it perhaps because they want people to use HIP?
> HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
> The HIPIFY tools automatically convert source from CUDA to HIP.
1. https://github.com/ROCm/HIP
-
AMD's Next GPU Is a 3D-Integrated Superchip
AMD has released HIP and a tool called HIPIFY which kind of behaves like this but at the source level¹. Rather than try and just translate CUDA to work on AMD compute they are more focused on higher level tooling.
Currently they seem to have a particular focus on AI frameworks and tools like PyTorch/Tensorflow/ONNX. They have sponsored and helped with a lot of PyTorch development for example, so PyTorch support for AMD is much better than it was this time last year².
¹(https://github.com/ROCm/HIP)
²(https://pytorch.org/blog/experience-power-pytorch-2.0/)
-
Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
> what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA
It isn't just old cards though, CUDA is a point of centralization on a single provider during a time when access to that providers higher end cards isn't even available and that is causing people to look elsewhere.
ROCm supports CUDA through the included HIP projects...
https://github.com/ROCm/HIP
https://github.com/ROCm/HIPCC
https://github.com/ROCm/HIPIFY
The later will regex replace your CUDA methods with HIP methods. If it is as easy as running hipify on your codebase (or just coding to HIP apis), it certainly makes sense to do so.
-
Nvidia on the Mountaintop
AMD's equivalent is HIP [1], for sufficiently flexible definitions of "equivalent". I can't speak to how complete/correct/performant it is (I'm just a guy running tutorial/toy-level ML stuff on an RDNA1 card), but part of AMD's problem is that it might not practically matter how well they do this because the broader ecosystem support specifically for the CUDA stack is so entrenched.
[1] https://github.com/ROCm-Developer-Tools/HIP
- Stable Diffusion in pure C/C++
- Would love to hear your information and knowledge to simplify my understanding on AMD's positioning in the AI market
-
Ask HN: C++ still dominates on GPUs, why not Rust?
From what I know, modern GPUs are still programmed with C++ exclusively. See CUDA [0] for Nvidia and ROCm [1] for AMD.
Why is this? Why Rust is not loved there?
[0] https://docs.nvidia.com/cuda/
[1] https://github.com/ROCm-Developer-Tools/HIP
-
[P] RWKV C++ Cuda library with no dependencies, no torch, and no python
Go ahead and try to ship ROCm code that works on multiple consumer graphics cards on Linux, MacOS, and Windows. As an example of how much AMD cares about it, the installation notes linked to in the readme returns a 404.
Vrmac
-
New Renderers for GTK
Couple times in the past I have implemented GPU-targeted GUI renderers, here’s an example: https://github.com/Const-me/Vrmac?tab=readme-ov-file#vector-... https://github.com/Const-me/Vrmac/blob/master/Vrmac/Draw/VAA...
2D graphics have very little in common with game engines. The problem is very different in many regards. In 2D, you generally have Bezier and other splines on input, large amount of overdraw, textures coming from users complicate VRAM memory management. OTOH, game engines are solving hard problem which are irrelevant to 2D renderers, like dynamic lighting, volumetric effects, and dynamic environment.
-
Was Rust Worth It?
> Part of Panama
Most real-live C APIs are using function pointers and/or complicated data structures. Here’s couple real-life examples defined by Linux kernel developers who made V4L2 API: [0], [1] The first of them contains a union in C version, i.e. different structures are at the same memory addresses. Note C# delivers the level of usability similar to C or C++: we simply define structures, and access these fields. Not sure this is gonna be easy in Java even after all these proposals arrive.
For a managed runtime, unmanaged interop is a huge feature which affects all levels of the stack: type system in the language for value types, GC to be able to temporarily pin objects passed to native code (making copies is prohibitively slow for use cases like video processing), code generator to convert managed delegates to C function pointers and vice versa, error handling to automatically convert between exceptions and integer status codes at the API boundary, and more. Gonna be very hard to add into the existing language like Java.
> "Vector API" JEP
That API is not good. They don’t expose hardware instructions, instead they have invented some platform-agnostic API and implemented graceful degradation.
This means the applicability is likely to be limited to pure vertical operations processing FP32 or FP64 numbers. The rest of the SIMD instructions are too different between architectures. A simple example in C++ is [2], see [3] for the context. That example is trivial to port to modern C#, but impossible to port to Java even after the proposed changes. The key part of the implementation is psadbw instruction, which is very specific to SSE2/AVX2 and these vector APIs don’t have an equivalent. Apart from reduction, other problematic operations are shuffles, saturating integer math, and some memory access patterns (gathers in AVX2, transposed loads/stores on NEON).
> most of these are not done / not in a stable LTS Java release yet
BTW, SIMD intrinsics arrived to C# in 2019 (.NET Core 3.0 released in 2019), and unmanaged interop support is available since the very first 1.0 version.
[0] https://github.com/Const-me/Vrmac/blob/master/VrmacVideo/Lin...
[1] https://github.com/Const-me/Vrmac/blob/master/VrmacVideo/Lin...
[2] https://gist.github.com/Const-me/3ade77faad47f0fbb0538965ae7...
[3] https://news.ycombinator.com/item?id=36618344
-
Stable Diffusion in pure C/C++
I have minimal experience with Rust. OTOH, programming C++ for living since 2000, with a few gaps when I used other languages like Obj-C and C#.
I agree C++ is very hard to learn if you only have experience with higher-level languages like Python and Scala. I think there’re two reasons for that.
C++ is unsafe. There’s no way around this one, it was designed that way, like C or assembly. Still, with modern toolset it’s not terribly bad. Compilers print warnings, BTW I typically ask them to treat warnings as errors to deliberately fail the build. On Windows, a combination of debug build, debug C runtime, and visual studio debugger helps tremendously. Linux compilers have these sanitizers (address, memory, thread, undefined behavior) which are comparable, they too sacrifice runtime speed for diagnostics and debuggability.
Another reason, the language itself is very complicated, especially the templates. However, just because something is in the language doesn’t mean it’s a good idea to use it. You don’t need to be familiar with that stuff unless doing something very advanced, like customizing the Eigen C++ library. Don’t follow the patterns found in the standard library: unlike your code, that library has good reasons to use that template BS. If instead of templates you do something else, C++ becomes much easier to use, and most importantly other people will still be able to read and understand your code. Another reason to avoid excessive template metaprogramming, it slows down the compiler, because template-heavy code often needs to be in headers as opposed to cpp files.
P.S. If you don’t need extreme levels of performance (defined as “approach the numbers listed in CPU specs”, the numbers are FLOPS or memory bandwidth), and you don’t need the ecosystem too much, consider C# instead of C++. Much faster than Python, often faster than Scala or Java, easy integration with C should you need that (same as Rust, much easier than Python or Java), the only downside is these ~100MB of the runtime. The reputation is weird, but technically the language and runtime are pretty good. For example, here’s a C# library which re-implements a subset of ffmpeg and libavcodec C libraries: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo
-
Media Player Element now available for cross-platform apps everywhere dotnet runs
BTW, I did that too for 32-bit ARM Linux on Raspberry Pi 4, back in 2020: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo Unlike Uno, my implementation doesn’t use libVLC and is written mostly in C#, only audio decoders are in C++. To decode video, I directly consume V4L2 Linux kernel APIs.
-
Ask HN: Those making $0/month or less on side projects – Show and tell
Doing that for decades.
An app for Windows phone, downloaded 140k times: https://github.com/Const-me/SkyFM
Cross-platform graphics library for .NET: https://github.com/Const-me/Vrmac
Recently, offline speech-to-text for Windows: https://github.com/Const-me/Whisper
At this point, I consider side projects like that as a hobby.
-
Minimal Cross-Platform Graphics
I think this needs much more complexity to be useful.
For the rendering, ideally it needs GPU support.
Input needs much more work, here's an overview for Windows: https://zserge.com/posts/fenster/
Windows' Sleep() function has default resolution 15.6ms, that's not enough for realtime rendering, and relatively hard to fix, ideally need a modern OS and a waitable timer created with high resolution flag.
Here's my attempt at making something similar, couple years ago: https://github.com/Const-me/Vrmac
- An MP4 file first draft
-
Cppfront, Herb Sutter's proposal for a new C++ syntax
I agree about Python or PHP.
However, for Java or modern C#, in my experience the performance is often fairly close. When using either of them, very often one doesn’t need C++ to be good enough.
Here’s an example, a video player library for Raspberry Pi4: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo As written on that page, just a few things are in C++ (GLES integration, audio decoders, and couple SIMD utility functions), the majority of things are in C#.
-
Vulkan update: version 1.2 conformance for Raspberry Pi 4
To be fair, in modern GL versions they fixed some of these things. In GLES 3.1 which I used a lot on Pi4 https://github.com/Const-me/Vrmac/ GPU vertex buffers and shaders worked fine, GLSL compiler in the drivers worked fine too.
However, others issues are still present. There’s no shaders bytecode, they have an extension to grab compiled shaders from GPU driver to cache on disk, but it doesn’t work. The only way to create shaders is separate compile and link API calls. Texture loading and binding API is still less than ideal.
- Advice for the next dozen Rust GUIs
What are some alternatives?
AdaptiveCpp - Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
neutralinojs - Portable and lightweight cross-platform desktop application development framework
ZLUDA - CUDA on AMD GPUs
nanovg - Antialiased 2D vector drawing library on top of OpenGL for UI and visualizations.
futhark - :boom::computer::boom: A data-parallel functional programming language
vello - An experimental GPU compute-centric 2D renderer.
kompute - General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
rapidyaml - Rapid YAML - a library to parse and emit YAML, and do it fast.
ginkgo - Numerical linear algebra software package
sokol - minimal cross-platform standalone C headers
rocm-arch - A collection of Arch Linux PKGBUILDS for the ROCm platform
NanoGUI - Minimalistic GUI library for OpenGL