Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
> users interact with pytorch - not with hardware libraries. so, if pytorch can abstract the hardware, users wont care.
At the most basic level, yes (pretty much "hello world"). This is what I meant by "it’s interesting to watch observers/casual users claim these implementations are competitive". Take a look at a project (nearly any project) and you will see plenty of specific commits for ROCm:
https://github.com/search?q=repo%3Ahuggingface%2Ftransformer...
https://github.com/search?q=repo%3AAUTOMATIC1111%2Fstable-di...
https://github.com/search?q=repo%3Avllm-project%2Fvllm+rocm&...
https://github.com/search?q=repo%3Aoobabooga%2Ftext-generati...
https://github.com/search?q=repo%3Amicrosoft%2FDeepSpeed+roc...
Check the dates - ROCm is six years old and all of these commits are /very/ recent.
Only the most simple projects are purely PyTorch to the point where other than random curiosities I'm not sure I've seen one in years.
Check the docs and pay attention to caveats everywhere for ROCm, with tables showing feature support for ROCm with asterisks all over the place. Repeat for nearly any project (check issues and pull requests while you're at it). Do the same for CUDA and you will see just how much specific hardware and underlying software work is required.
> all users will care about is dollar cost of doing their work.
Exactly. Check PyTorch issues.
ROCm:
https://github.com/pytorch/pytorch/issues?q=is%3Aissue+rocm
8,548 total issues.
CUDA:
19,692 total issues.
With Nvidia having 90% market share in AI and 80% market share on desktop and being supported in torch since day one those ratios are way off. For now and the foreseeable future if you're a business (time isn't free) the total cost of an actual solution from getting running, to training, to actually doing inference (especially at high production scale) very heavily favors Nvidia/CUDA. I've worked in this space for years and at least once a month since the initial releases of ROCm on Vega in 2017 I check in on AMD/ROCm and can't believe how bad it is. I've spent many thousands of dollars on AMD hardware so that I can continually evaluate it - if ROCm were anywhere close to CUDA in terms of total cost I'd be deploying it. My AMD hardware just sits there, waiting over half a decade for ROCM to be practical.
I don't have some blind fielty to Nvidia, own any stock, or care what logo is stamped on the box. I'm just trying to get stuff done.
> further, almost everyone in the ecosystem has an incentive to commoditize the hardware (users, cloud vendors, etc). over time i see the moat eroding - as the moat does not attach directly to the user.
We're very much in agreement. Your key statement is "over time" and this is what I was referring to with 'I’m really rooting for them but the reality is these CUDA “competitors” have a very very long way to go.'. It's going to be a while...
Related posts
- Tinygrad: Hacked 4090 driver to enable P2P
- Functions and operators for Dot and Matrix multiplication and Element-wise calculation in PyTorch
- Building a GPT Model from the Ground Up!
- PyTorch 2.2: FlashAttention-v2, AOTInductor
- Beyond Backpropagation - Higher Order, Forward and Reverse-mode Automatic Differentiation for Tensorken