Top 44 Trending Cuda Projects
-
-
-
nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
-
-
SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
-
-
SageAttention
Quantized Attention achieves speedup of 2-3x and 3-5x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
-
-
-
-
-
-
-
-
-
-
HRNet-Human-Pose-Estimation
This repo is copied from https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
-
array-language-comparisons
A comparison of array languages & libraries: APL, J, BQN, Uiua, Q, Julia, R, NumPy, Nial, Futhark, Dex, Ivy, SaC & ArrayFire.
-
-
-
-
raft
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications. (by rapidsai)
-
-
GPUSorting
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
-
-
-
-
-
-
-
-
-
-
dietgpu
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
-
-
-
-
-
deep-high-resolution-net.pytorch
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"
-
-
deep-painterly-harmonization
Code and data for paper "Deep Painterly Harmonization": https://arxiv.org/abs/1804.03189
-
-
Index
What are some of the trending open-source Cuda projects? This list will help you:
Project | Growth | |
---|---|---|
1 | FlashMLA | 99.4% |
2 | DeepEP | 95.1% |
3 | nunchaku | 51.6% |
4 | BenchmarkCustomPTX | 36.6% |
5 | SpargeAttn | 22.0% |
6 | AlexNet-Source-Code | 17.4% |
7 | SageAttention | 12.5% |
8 | flash-attention-minimal | 11.3% |
9 | causal-conv1d | 10.0% |
10 | ThunderKittens | 10.0% |
11 | Parallel-Computing-Cuda-C | 7.5% |
12 | NATTEN | 7.2% |
13 | Nanoflow | 6.8% |
14 | nccl-tests | 6.6% |
15 | CUDALibrarySamples | 5.9% |
16 | cuda_programming | 5.6% |
17 | HRNet-Human-Pose-Estimation | 4.9% |
18 | array-language-comparisons | 4.6% |
19 | cugraph | 4.3% |
20 | cuhnsw | 3.9% |
21 | cuspatial | 2.9% |
22 | raft | 2.6% |
23 | llm.c | 2.4% |
24 | GPUSorting | 1.8% |
25 | TorchPQ | 1.8% |
26 | k2 | 1.8% |
27 | cuda-convnet2 | 1.8% |
28 | Gpufit | 1.5% |
29 | CGBN | 1.4% |
30 | RWKV-CUDA | 1.4% |
31 | instant-ngp | 1.3% |
32 | HVM | 1.2% |
33 | Lantern | 1.2% |
34 | megalodon | 1.0% |
35 | dietgpu | 0.9% |
36 | kilonerf | 0.8% |
37 | blocksparse | 0.7% |
38 | unet.cu | 0.7% |
39 | MegBA | 0.6% |
40 | deep-high-resolution-net.pytorch | 0.4% |
41 | nvParse | 0.2% |
42 | deep-painterly-harmonization | 0.0% |
43 | instant-ngp-Windows | 0.0% |
44 | SENet | 0.0% |