Accelerate PyTorch with Taichi: Data Preprocessing & High-performance ML Operator Customization

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

RWKV-CUDA

3 186 8.5 Cuda

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

This repo introduces an interesting example of customizing an ML operator in CUDA. The author developed an RWKV language model using sort of a one-dimensional depthwise convolution custom operator. The model in itself does not involve large amounts of computation, but still runs slow because PyTorch does not have native support for it. So, the author customized the operator in CUDA and used a set of optimization techniques, such as loop fusion and Shared Memory, achieving a performance 20x better than he did with PyTorch.

blog_code

1 4 10.0 Python

Pure PyTorch padding

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: One Billion Rows in CUDA
1 project | news.ycombinator.com | 13 Apr 2024
The Simple Beauty of XOR Floating Point Compression
1 project | news.ycombinator.com | 11 Apr 2024
Show HN: Faster sorting with register shuffling in CUDA
1 project | news.ycombinator.com | 15 Mar 2024
Raft: Fundamental widely-used algorithms and primitives for machine learning
1 project | news.ycombinator.com | 22 Feb 2024
A Fast FP16xFP4 Gemm CUDA Kernel
1 project | news.ycombinator.com | 29 Jan 2024

Accelerate PyTorch with Taichi: Data Preprocessing & High-performance ML Operator Customization

This page summarizes the projects mentioned and recommended in the original post on /r/Python Post date: 14 Sep 2022

RWKV-CUDA

blog_code

InfluxDB

Related posts

Accelerate PyTorch with Taichi: Data Preprocessing &amp; High-performance ML Operator Customization

This page summarizes the projects mentioned and recommended in the original post on /r/Python Post date: 14 Sep 2022

RWKV-CUDA

blog_code

InfluxDB

Related posts

Accelerate PyTorch with Taichi: Data Preprocessing & High-performance ML Operator Customization