SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Cuda Open-Source Projects
-
vLLM stands for virtual large language models. It is one of the open source fast inferencing and serving libraries. As the name suggests, ‘virtual’ encapsulates the concept of virtual memory and paging from operating systems, which allows addressing the problem of maximum utilization of resources and providing faster token generation by utilizing PagedAttention. Traditional LLM serving involves storing large attention keys and value tensors in GPU memory, leading to inefficient memory usage.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: Build An Advanced Password Cracker With Python (Complete Guide) | dev.to | 2024-10-07
Download Hashcat from the official website.
-
-
Use of Open-Source Solutions and Customizable Models. On-premise systems, such as Lingvanex and Kaldi, provide tools to develop speech recognition models from scratch or based on open-source libraries. Unlike cloud services, where developers are limited to pre-built models, on-premise solutions allow you to create a system that fully matches the specifics of the task. For example, models can be trained on specific datasets, including professional vocabulary, dialects, or phrases typical to certain fields (e.g., healthcare or law).
-
-
-
Project mention: Ask HN: Resources for General Purpose GPU development on Apple's M* chips? | news.ycombinator.com | 2024-12-25
If you're looking for a high level introduction to GPU development on Apple silicon I would recommend learning Metal. It's Apple's GPU acceleration language similar to CUDA for Nvidia hardware. I ported a set of puzzles for CUDA called GPU-Puzzles (a collection of exercises designed to teach GPU programming fundamentals)[1] to Metal [2]. I think it's a very accessible introduction to Metal and writing GPU kernels.
[1] https://github.com/srush/GPU-Puzzles
[2] https://github.com/abeleinin/Metal-Puzzles
-
I'm surprised to see pytorch and Jax mentioned as alternatives but not numba : https://github.com/numba/numba
I've recently had to implement a few kernels to lower the memory footprint and runtime of some pytorch function : it's been really nice because numba kernels have type hints support (as opposed to raw cupy kernels).
-
-
Project mention: Unleashing GPU Power: Supercharge Your Data Processing with cuDF | dev.to | 2024-06-21
cuDF Documentation
-
If you are working with an ML team that trained their own model or you want to host any model off Huggingface and use the same Docker container approach, you can also check out cog by Replicate. It wraps Docker and is specifically designed for creating Docker containers for ML models.
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05 -
gocv
Go package for computer vision using OpenCV 4 and beyond. Includes support for DNN, CUDA, OpenCV Contrib, and OpenVINO.
Project mention: Cylon: JavaScript framework for robotics, drones, and the Internet of Things | news.ycombinator.com | 2024-05-04 -
-
-
Years ago I started a collection of convolution optimization resources: https://github.com/mratsim/laser/wiki/Convolution-optimisati...
Also checked and apparently Nvidia Cutlass now supports generic convolutions: https://github.com/NVIDIA/cutlass
-
-
Project mention: Alien – CUDA-powered artificial life simulation program | news.ycombinator.com | 2024-08-17
Prety neat! Looks like AMD might be possibility as some folks are trying to run it: https://github.com/chrxh/alien/issues/99
-
nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Project mention: nvitop VS nviwatch - a user suggested alternative | libhunt.com/r/nvitop | 2024-09-09 -
-
-
-
Are you aware of HIP? It's officially supported and, for code that avoids obscure features of CUDA like inline PTX, it's pretty much a find-and-replace to get a working build:
https://github.com/ROCm/HIP
Don't believe me? Include this at the top of your CUDA code, build with hipcc, and see what happens:
https://gitlab.com/StanfordLegion/legion/-/blob/master/runti...
It's incomplete because I'm lazy but you can see most things are just a single #ifdef away in the implementation.
Cuda discussion
Cuda related posts
-
The Missing Nvidia GPU Glossary
-
Show HN: HipScript – Run CUDA in the Browser with WebAssembly and WebGPU
-
Making AMD GPUs competitive for LLM inference
-
How to run llama 405b bf16 with gh200s
-
Fast LLM Inference From Scratch (using CUDA)
-
Cloud Solutions vs. On-Premise Speech Recognition Systems
-
The Success and Failure of Ninja (2020)
-
A note from our sponsor - SaaSHub
www.saashub.com | 15 Jan 2025
Index
What are some of the best open-source Cuda projects? This list will help you:
Project | Stars | |
---|---|---|
1 | vllm | 33,579 |
2 | hashcat | 21,716 |
3 | instant-ngp | 16,181 |
4 | Kaldi Speech Recognition Toolkit | 14,426 |
5 | Open3D | 11,751 |
6 | ZLUDA | 10,312 |
7 | GPU-Puzzles | 10,290 |
8 | Numba | 10,117 |
9 | cupy | 9,667 |
10 | cudf | 8,595 |
11 | cog | 8,269 |
12 | catboost | 8,182 |
13 | gocv | 6,810 |
14 | cuda-samples | 6,748 |
15 | oneflow | 6,554 |
16 | cutlass | 5,995 |
17 | chainer | 5,894 |
18 | alien | 5,048 |
19 | nvitop | 5,005 |
20 | ArrayFire | 4,598 |
21 | cuml | 4,342 |
22 | tiny-cuda-nn | 3,826 |
23 | HIP | 3,826 |