SaaSHub helps you find the best software and product alternatives Learn more →
Faster_SGEMM_CUDA Alternatives
Similar projects and alternatives to Faster_SGEMM_CUDA
-
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a better Faster_SGEMM_CUDA alternative or higher similarity.
Faster_SGEMM_CUDA discussion
Faster_SGEMM_CUDA reviews and mentions
Posts with mentions or reviews of Faster_SGEMM_CUDA.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-07-26.
- Single precision matrix multiplication up to 43% faster than the one from cuBLAS
-
Fast Multidimensional Matrix Multiplication on CPU from Scratch
Related: I created a CUDA kernel typically much faster than kernels from cuBLAS when multiplying large square float32 matrices. Tested mostly on a 4090 GPU so far.
Source code: https://github.com/arekpaterek/Faster_SGEMM_CUDA
size tflops_cublas tflops_my diff gpu
- Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090
- How to Optimize a CUDA Matmul Kernel for CuBLAS-Like Performance: A Worklog
-
A note from our sponsor - SaaSHub
www.saashub.com | 8 Jul 2025
Stats
Basic Faster_SGEMM_CUDA repo stats
4
0
3.3
11 months ago
arekpaterek/Faster_SGEMM_CUDA is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of Faster_SGEMM_CUDA is Cuda.