AMD's CDNA 3 Compute Architecture

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Whisper

    High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model (by Const-me)

  • Couple times in the past I wanted to port open source ML models from CUDA/Python to a better technology stack. I have ported Whisper https://github.com/Const-me/Whisper/ and Mistral https://github.com/Const-me/Cgml/ to D3D11. I don’t remember how much time I spent, but given both were unpaid part-time hobby projects, probably under 160 hours / each.

    These software projects were great to validate the technology choices, but note I only did bare minimum to implement specific ML models. Implementing a complete PyTorch backend gonna involve dramatically more work. I can’t even estimate how much more because I’m not an expert in Python or these Python-based ML libraries.

  • Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

  • Couple times in the past I wanted to port open source ML models from CUDA/Python to a better technology stack. I have ported Whisper https://github.com/Const-me/Whisper/ and Mistral https://github.com/Const-me/Cgml/ to D3D11. I don’t remember how much time I spent, but given both were unpaid part-time hobby projects, probably under 160 hours / each.

    These software projects were great to validate the technology choices, but note I only did bare minimum to implement specific ML models. Implementing a complete PyTorch backend gonna involve dramatically more work. I can’t even estimate how much more because I’m not an expert in Python or these Python-based ML libraries.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • clspv

    Clspv is a compiler for OpenCL C to Vulkan compute shaders

  • Vulkan Compute backends for numerical compute (as typified by both OpenCL and SYCL) are challenging, you can look at Google's cspv https://github.com/google/clspv project for the nitty gritty details. The lowest-effort path is actually via some combination of Rocm (for hardware that AMD bothers to support themselves) and the Mesa project's Rusticl backend (for everything else).

  • Bolt

    Bolt is a C++ template library optimized for GPUs. Bolt provides high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. (by HSA-Libraries)

  • this is frankly starting to sound a lot like the ridiculous "blue bubbles" discourse.

    AMD's products have generally failed to catch traction because their implementations are halfassed and buggy and incomplete (despite promising more features, these are often paper features or career-oriented development from now-departed developers). all of the same "developer B" stuff from openGL really applies to openCL as well.

    http://richg42.blogspot.com/2014/05/the-truth-on-opengl-driv...

    AMD has left a trail of abandoned code and disappointed developers in their wake. These two repos are the same thing for AMD's ecosystem and NVIDIA's ecosystem, how do you think the support story compares?

    https://github.com/HSA-Libraries/Bolt

    https://github.com/NVIDIA/thrust

    in the last few years they have (once again) dumped everything and started over, ROCm supported essentially no consumer cards and rotated support rapidly even in the CDNA world. It offers no binary compatibility support story, it has to be compiled for specific chips within a generation, not even just "RDNA3" but "Navi 31 specifically". Etc etc. And nobody with consumer cards could access it until like, six months ago, and that still is only on windows, consumer cards are not even supported on linux (!).

    https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...

    This is on top of the actual problems that still remain, as geohot found out. Installing ROCm is a several-hour process that will involve debugging the platform just to get it to install, and then you will probably find that the actual code demos segfault when you run them.

    AMD's development processes are not really open, and actual development is silo'd inside the company with quarterly code dumps outside. The current code is not guaranteed to run on the actual driver itself, they do not test it even in the supported configurations.

    it hasn't got traction because it's a low-quality product and nobody can even access it and run it anyway.

  • Thrust

    Discontinued [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

  • this is frankly starting to sound a lot like the ridiculous "blue bubbles" discourse.

    AMD's products have generally failed to catch traction because their implementations are halfassed and buggy and incomplete (despite promising more features, these are often paper features or career-oriented development from now-departed developers). all of the same "developer B" stuff from openGL really applies to openCL as well.

    http://richg42.blogspot.com/2014/05/the-truth-on-opengl-driv...

    AMD has left a trail of abandoned code and disappointed developers in their wake. These two repos are the same thing for AMD's ecosystem and NVIDIA's ecosystem, how do you think the support story compares?

    https://github.com/HSA-Libraries/Bolt

    https://github.com/NVIDIA/thrust

    in the last few years they have (once again) dumped everything and started over, ROCm supported essentially no consumer cards and rotated support rapidly even in the CDNA world. It offers no binary compatibility support story, it has to be compiled for specific chips within a generation, not even just "RDNA3" but "Navi 31 specifically". Etc etc. And nobody with consumer cards could access it until like, six months ago, and that still is only on windows, consumer cards are not even supported on linux (!).

    https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...

    This is on top of the actual problems that still remain, as geohot found out. Installing ROCm is a several-hour process that will involve debugging the platform just to get it to install, and then you will probably find that the actual code demos segfault when you run them.

    AMD's development processes are not really open, and actual development is silo'd inside the company with quarterly code dumps outside. The current code is not guaranteed to run on the actual driver itself, they do not test it even in the supported configurations.

    it hasn't got traction because it's a low-quality product and nobody can even access it and run it anyway.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts