open-gpu-kernel-modules vs Pytorch

open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support (by tinygrad)

Suggest topics

Source Code

Suggest alternative

Edit details

Pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration (by pytorch)

Deep Learning neural-network Autograd GPU Numpy Tensor Python Machine Learning

Source Code

pytorch.org

Docs

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

open-gpu-kernel-modules		Pytorch
	Project
2	Mentions	348
759	Stars	79,328
9.6%	Growth	1.7%
7.3	Activity	10.0
7 days ago	Latest Commit	3 days ago
C	Language	Python
GNU General Public License v3.0 or later	License	BSD 1-Clause License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

open-gpu-kernel-modules

Posts with mentions or reviews of open-gpu-kernel-modules. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-06-12.

How Meta trains large language models at scale
3 projects | news.ycombinator.com | 12 Jun 2024

Who is "they"?
RTX 4090s are terrible for this task. Off the top of my head:
- VRAM (obviously). Isn't that where the racks come in? Not really. Nvidia famously removed something as basic as NVLink between two cards from the 3090 to the 4090. When it comes to bandwidth between cards (crucial) even 16 lanes of PCIe 4 isn't fast enough. When you start talking about "racks" unless you're running on server grade CPUs (contributing to cost vs power vs density vs perf) you're not going to have nearly enough PCIe lanes to get very far. Even P2P over PCIe requires a hack geohot developed[0] and needless to say that's umm, less than confidence inspiring for what you would lay out ($$$) in terms of hardware, space, cooling, and power. The lack of ECC is a real issue as well.
- Form factor. Remember PCIe lanes, etc? The RTX 4090 is a ~three slot beast when using air cooling and needless to say rigging up something like the dual slot water cooled 4090s I have at scale is another challenge altogether... How are people going to wire this up? What do the enclosures/racks/etc look like? This isn't like crypto mining where cheap 1x PCIe 1x risers can be used without dramatically limiting performance.
- Performance. As grandparent comment noted 4090s are not designed for this workload. In typical usage for training I see them as 10-20% faster than an RTX 3090 at much higher cost. Compared to my H100 with SXM4 it's ridiculously slow.
- Market segmentation. Nvidia really knows what they're doing here... There are all kinds of limitations you run into with how the hardware is designed (like Tensor Core performance for inference especially).
- Issues at scale. Look at the Meta post - their biggest issues are things that are dramatically worse with consumer cards like the RTX 4090, especially when you're running with some kind of goofy PCIe cabling issue (like risers).
- Power. No matter what power limiting you employ an RTX 4090 is pretty bad for power/performance ratio. The card isn't fundamentally designed for these tasks - it's designed to run screaming for a few hours a day so gamers can push as many FPS at high res as possible. Training, inference, etc is a different beast and the performance vs power ratio for these tasks is terrible compared to A/H100. Now lets talk about the physical cabling, PSU, etc issues. Yes miners had hacks for this as well but it's yet another issue.
- Fan design. There isn't a single "blower" style RTX 4090 on the market. There was a dual-slot RTX 3090 at one point (I have a bunch of them) but Nvidia made Gigabyte pull them from the market because people were using them for this. Figuring out some kind of air-cooling setup with the fan and cooling design of the available RTX 4090 cards sounds like a complete nightmare...
- Licensing issues. Again, laying out the $$$ for this with a deployment that almost certainly violates the Nvidia EULA is a risky investment.
Three RTX 4090s (at 9 slots) to get "only" 72GB of VRAM, talking over PCIe, using 48 PCIe lanes, multi-node over sloooow ethernet (hitting CPU - slower and yet more power), using what likely ends up at ~900 watts (power limited) for significantly reduced throughput and less VRAM is ridiculous.
I'm all for creativity but deploying "racks" of 4090s for AI tasks is (frankly) flat-out stupid.
[0] - https://github.com/tinygrad/open-gpu-kernel-modules
Tinygrad: Hacked 4090 driver to enable P2P
5 projects | news.ycombinator.com | 12 Apr 2024

Pytorch

Posts with mentions or reviews of Pytorch. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-06-14.

Top 17 Fast-Growing Github Repo of 2024
11 projects | dev.to | 14 Jun 2024

PyTorch
AMD's MI300X Outperforms Nvidia's H100 for LLM Inference
1 project | news.ycombinator.com | 13 Jun 2024

> their own custom stack to interact with GPUs
lol completely made up.
are you conflating CUDA the platform with the C/C++ like language that people write into files that end with .cu? because while some people are indeed not writing .cu files, absolutely no one is skipping the rest of the "stack".
source: i work at one of these "mega corps". hell if you don't believe me go look at how many CUDA kernels pytorch has https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/n....
> Everybody thinks it’s CUDA that makes Nvidia the dominant player.
it 100% does
Awesome List
25 projects | dev.to | 8 Jun 2024

PyTorch - An open source machine learning framework. PyTorch Tutorials - Tutorials and documentation.
Understanding GPT: How To Implement a Simple GPT Model with PyTorch
2 projects | dev.to | 31 May 2024

In this guide, we provided a comprehensive, step-by-step explanation of how to implement a simple GPT (Generative Pre-trained Transformer) model using PyTorch. We walked through the process of creating a custom dataset, building the GPT model, training it, and generating text. This hands-on implementation demonstrates the fundamental concepts behind the GPT architecture and serves as a foundation for more complex applications. By following this guide, you now have a basic understanding of how to create, train, and utilize a simple GPT model. This knowledge equips you to experiment with different configurations, larger datasets, and additional techniques to enhance the model's performance and capabilities. The principles and techniques covered here will help you apply transformer models to various NLP tasks, unlocking the potential of deep learning in natural language understanding and generation. The methodologies presented align with the advancements in transformer models introduced by Vaswani et al. (2017), emphasizing the power of self-attention mechanisms in processing sequences of data more effectively than traditional approaches (Vaswani et al., 2017). This understanding opens pathways to explore and innovate in the field of natural language processing using cutting-edge deep learning techniques (Kingma & Ba, 2015).
Building a Simple Chatbot using GPT model - part 2
1 project | dev.to | 31 May 2024

PyTorch is a powerful and flexible deep learning framework that offers a rich set of features for building and training neural networks.
Clusters Are Cattle Until You Deploy Ingress
16 projects | dev.to | 30 May 2024

Oddly enough, sometimes, the best way to learn is by putting forth incorrect opinions or questions. Recently, while wrestling with AI project complexities, I pondered aloud whether all Docker images with AI models would inevitably be bulky due to PyTorch dependencies. To my surprise, this sparked many helpful responses, offering insights into optimizing image sizes. Being willing to be wrong opens up avenues for rapid learning.
Tinygrad 0.9.0
8 projects | news.ycombinator.com | 28 May 2024

Tinygrad targets consumer hardware (to be precise, only Radeon 7900XTX and nothing else[1]), while ROCm does not actually provide good support for such hardware. For example, last release of hipBLASLt-6.1.1 library has deep integration with PyTorch[1], while working only on AMD Instinct hardware. And even for the professional hardware out there, the support period is ridiculous: AMD Instinct MI100 (2020) is not supported. Only 4 years and tens of thousands of dollars worth of hardware is going to the trash, yay!
And to be more precise, they still use some core libraries from ROCm stack[3], they just don't use all these fancy multi-gigabyte[4] hardware-limited rocBLAS/hipBLASlt/rocWMMA/rocRAND/etc. libraries.
[1] https://tinygrad.org/#tinybox
[2] https://github.com/pytorch/pytorch/issues/119081
[3] https://github.com/tinygrad/tinygrad/blob/v0.9.0/tinygrad/ru...
[4] https://repo.radeon.com/rocm/yum/6.1.1/main/
PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed
1 project | news.ycombinator.com | 10 May 2024
Clasificador de imágenes con una red neuronal convolucional (CNN)
2 projects | dev.to | 1 May 2024

PyTorch (https://pytorch.org/)
AI enthusiasm #9 - A multilingual chatbot📣🈸
6 projects | dev.to | 1 May 2024

torch is a package to manage tensors and dynamic neural networks in python (GitHub)

What are some alternatives?

When comparing open-gpu-kernel-modules and Pytorch you can also consider the following projects:

Flux.jl - Relax! Flux is the ML library that doesn't make you tensor

mediapipe - Cross-platform, customizable ML solutions for live and streaming media.

Apache Spark - Apache Spark - A unified analytics engine for large-scale data processing

flax - Flax is a neural network library for JAX that is designed for flexibility.

tinygrad - You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad]

Pandas - Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Deep Java Library (DJL) - An Engine-Agnostic Deep Learning Framework in Java

tensorflow - An Open Source Machine Learning Framework for Everyone

stable-baselines3 - PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

ROCm - AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]

tesseract-ocr - Tesseract Open Source OCR Engine (main repository)

OpenCV - Open Source Computer Vision Library

Pytorch vs Flux.jl Pytorch vs mediapipe Pytorch vs Apache Spark Pytorch vs flax Pytorch vs tinygrad Pytorch vs Pandas Pytorch vs Deep Java Library (DJL) Pytorch vs tensorflow Pytorch vs stable-baselines3 Pytorch vs ROCm Pytorch vs tesseract-ocr Pytorch vs OpenCV

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Compare open-gpu-kernel-modules vs Pytorch and see what are their differences.

open-gpu-kernel-modules

Pytorch

open-gpu-kernel-modules

Pytorch

What are some alternatives?