kernel_tuner vs halutmatmul

halutmatmul

Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator (by joennlae)

Machine Learning matrix-multiplication Pytorch amm Cuda cuda-kernels

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

kernel_tuner		halutmatmul
	Project
4	Mentions	3
243	Stars	201
9.9%	Growth	-
9.1	Activity	9.4
5 days ago	Latest Commit	5 months ago
Python	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

kernel_tuner

Posts with mentions or reviews of kernel_tuner. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-12.

Ask HN: What apps have you created for your own use?
212 projects | news.ycombinator.com | 12 Dec 2023

I've created Kernel Tuner (https://github.com/KernelTuner/kernel_tuner) as a small software development tool, because I was writing a lot of CUDA and OpenCL kernels at the time. I didn't want to manually figure out what best thread block dimensions and work division among threads were on every GPU over and over again.
The tool evolved quite a bit since the first versions. I'm also using it for testing GPU code, teaching, and it has become one of the main drivers behind a lot of the research that I do.
PhD'ers, what are you working on? What CS topics excite you?
2 projects | /r/computerscience | 17 Jan 2023

We have an open science policy, so anyone can use our framework yourself to optimize stuff, if you want! The original paper is linked at the bottom of the GitHub page.
How to Optimize a CUDA Matmul Kernel for CuBLAS-Like Performance: A Worklog
5 projects | news.ycombinator.com | 4 Jan 2023

This is a great post for people who are new to optimizing GPU code.
It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).
As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)
Kernel Tuner
1 project | news.ycombinator.com | 30 Apr 2021

halutmatmul

Posts with mentions or reviews of halutmatmul. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-06-18.

Show HN: Stella Nera – Maddness Hardware Accelerator
1 project | news.ycombinator.com | 21 Nov 2023
10x faster matrix and vector operations
4 projects | news.ycombinator.com | 18 Jun 2022

This master's thesis sort of does it, but it doesn't have any fine-tuning yet so it completely wrecks the accuracy: https://github.com/joennlae/halutmatmul.
If someone worked on contributing this to Composer [1] I'd be down to help out. I can't justify building it all on my own right now since we're 100% focused on training speedup, but I could definitely meet and talk through it, help code tricky parts, review PRs, etc.
[1] https://github.com/mosaicml/composer

What are some alternatives?

When comparing kernel_tuner and halutmatmul you can also consider the following projects:

pyopencl - OpenCL integration for Python, plus shiny features

QualityScaler - QualityScaler - image/video deeplearning upscaling for any GPU

tf-quant-finance - High-performance TensorFlow library for quantitative finance.

3d-ken-burns - an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

arrayfire-python - Python bindings for ArrayFire: A general purpose GPU library.

composer - Supercharge Your Model Training

scikit-cuda - Python interface to GPU-powered libraries

bolt - 10x faster matrix and vector operations

BlendLuxCore - Blender Integration for LuxCore

PyTorch-Guide - PyTorch Guide

catboost - A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

caer - High-performance Vision library in Python. Scale your research, not boilerplate.

kernel_tuner vs pyopencl halutmatmul vs QualityScaler kernel_tuner vs tf-quant-finance halutmatmul vs 3d-ken-burns kernel_tuner vs arrayfire-python halutmatmul vs composer kernel_tuner vs scikit-cuda halutmatmul vs bolt kernel_tuner vs BlendLuxCore halutmatmul vs PyTorch-Guide kernel_tuner vs catboost halutmatmul vs caer

Compare kernel_tuner vs halutmatmul and see what are their differences.

kernel_tuner

halutmatmul

kernel_tuner

halutmatmul

What are some alternatives?