99
225
442
Mentions
@
|
Stars | Project | Description |
---|---|---|---|
16 | 23,213 | LLM training in simple, raw C/CUDA | |
1 | 396 | A throughput-oriented high-performance serving framework for LLMs | |
1 | 45 | Simple and fast low-bit matmul kernels in CUDA | |
1 | 3 | CUDA utilties/helpers for simplifying multidimensional indexing |
Popular Cuda Topics
Latest Mentions
Latest mentioned Cuda repos
Stars | Project |
---|---|
396 | Nanoflow |
3 | cuda-utils |
23,213 | llm.c |
45 | gemlite |
0 | Faster_SGEMM_CUDA |
2 | octomul |
150 | flute |
29 | vllmini |
3 | scaling-democracy |
27 | llm.c |
560 | unet.cu |
187 | cuda-checkpoint |
16 | hillisp |
501 | megalodon |
6 | PMPP_notes |
1,475 | ThunderKittens |
10,410 | HVM |
52 | jaxsplat |
27 | simpleGEMM |
198 | CGBN |
Latest Discoveries
Latest discovered Cuda repos
Stars | Project |
---|---|
396 | Nanoflow |
3 | cuda-utils |
45 | gemlite |
0 | Faster_SGEMM_CUDA |
2 | octomul |
150 | flute |
29 | vllmini |
3 | scaling-democracy |
560 | unet.cu |
16 | hillisp |
6 | PMPP_notes |
27 | llm.c |
52 | jaxsplat |
1,475 | ThunderKittens |
27 | simpleGEMM |
198 | CGBN |
187 | cuda-checkpoint |
501 | megalodon |
8 | cuda-1brc |
23,213 | llm.c |
Recently updated posts
-
Nanoflow: A throughput-oriented high-performance serving framework for LLMs
-
CUDA Utils – Simple CUDA utils/helpers for simplifying ND tensor memory access
-
Gemlite: Simple and fast low-bit matmul kernels in CUDA
-
Single precision matrix multiplication up to 43% faster than the one from cuBLAS
-
Fast Multidimensional Matrix Multiplication on CPU from Scratch