kernel_tuner
spotprice
kernel_tuner | spotprice | |
---|---|---|
4 | 6 | |
243 | 28 | |
9.9% | - | |
9.1 | 6.3 | |
4 days ago | 5 months ago | |
Python | Go | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
kernel_tuner
-
Ask HN: What apps have you created for your own use?
I've created Kernel Tuner (https://github.com/KernelTuner/kernel_tuner) as a small software development tool, because I was writing a lot of CUDA and OpenCL kernels at the time. I didn't want to manually figure out what best thread block dimensions and work division among threads were on every GPU over and over again.
The tool evolved quite a bit since the first versions. I'm also using it for testing GPU code, teaching, and it has become one of the main drivers behind a lot of the research that I do.
-
PhD'ers, what are you working on? What CS topics excite you?
We have an open science policy, so anyone can use our framework yourself to optimize stuff, if you want! The original paper is linked at the bottom of the GitHub page.
-
How to Optimize a CUDA Matmul Kernel for CuBLAS-Like Performance: A Worklog
This is a great post for people who are new to optimizing GPU code.
It is interesting to see that the author got this far without interchanging the innermost loop over k to the outermost loop, as is done in CUTLASS (https://github.com/NVIDIA/cutlass).
As you can see in this blog post the code ends up with a lot of compile-time constants (e.g. BLOCKSIZE, BM, BN, BK, TM, TN) one way to optimize this code further is to use an auto-tuner to find the optimal value for all of these parameters for your GPU and problem size, for example Kernel Tuner (https://github.com/KernelTuner/kernel_tuner)
- Kernel Tuner
spotprice
-
Ask HN: What apps have you created for your own use?
All of mine are CLI...
https://github.com/jftuga/less-Windows - [not really mine, but I just help maintain the port] - GNU less compiled for Windows 10 & 11. Stand-alone version with no dependencies.
https://github.com/jftuga/gofwd - A cross-platform TCP port forwarder with Duo 2FA and Geo-IP integration
https://github.com/jftuga/spotprice - Quickly get AWS spot instance pricing - a bit easier to use than the aws cli; is also faster and has more features
https://github.com/jftuga/tcpscan - A standalone, fast, simple, multi-threaded cross-platform IPv4 TCP port scanner
https://github.com/jftuga/ipinfo - Return IP address info including geographic location and distance when given IP address, email address, host name or URL
https://github.com/jftuga/photo_id_resizer - Resize photo ID images using face recognition technology
https://github.com/jftuga/chars - Determine the end-of-line format, tabs, bom, and nul characters
-
How to find regions where p4d.24xlarge instances are available?
I wrote a program to quickly and easily get AWS EC2 spot price. Here is the output:
-
Leveraging Mispriced AWS Spot Instances
I wrote a program to get AWS spot instance pricing. This program is similar to using "aws ec2 describe-spot-price-history" but is faster and has a few more options.
https://github.com/jftuga/spotprice
- AWS EC2 Spot Instances Availability by Region
- Common avenues for reducing waste in AWS (Specifically EC2)
- Ask HN: What are some tools / libraries you built yourself?
What are some alternatives?
halutmatmul - Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator
lowdefy - The config web stack for business apps - build internal tools, client portals, web apps, admin panels, dashboards, web sites, and CRUD apps with YAML or JSON.
pyopencl - OpenCL integration for Python, plus shiny features
terraform_ec2_spot_instance - Use terraform to create an AWS EC2 spot instance
tf-quant-finance - High-performance TensorFlow library for quantitative finance.
rupy - HTTP App. Server and JSON DB - Shared Parallel (Atomic) & Distributed
arrayfire-python - Python bindings for ArrayFire: A general purpose GPU library.
yadm - Yet Another Dotfiles Manager
scikit-cuda - Python interface to GPU-powered libraries
sqldb-logger - A logger for Go SQL database driver without modifying existing *sql.DB stdlib usage.
BlendLuxCore - Blender Integration for LuxCore
Tabula - Extract tables from PDF files