[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

kernl

8 1,458 1.5 Jupyter Notebook

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

I periodically check kernl.ai to see whether the documentation and tutorial sections have been expanded. My advice is put some real effort and focus in to examples and tutorials. It is key for an optimization/acceleration library. 10x-ing the users of a library like this is much more likely to come from spending 10 out of every 100 developer hours writing tutorials, as opposed to spending those 8 or 9 of those tutorial-writing hours on developing new features which only a small minority understand how to apply.
whisper.cpp

187 29,540 9.8 C

Port of OpenAI's Whisper model in C/C++

I just discovered the project https://github.com/ggerganov/whisper.cpp
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
whisper

343 59,916 6.8 Python

Robust Speech Recognition via Large-Scale Weak Supervision

We measured a 2.3x speedup on Nvidia A100 GPU (2.4x on 3090 RTX) compared to Hugging Face implementation using FP16 mixed precision on transcribing librispeech test set (over 2600 examples). For now, OpenAI implementation is not yet PyTorch 2.0 compliant.
flash-attention

26 10,642 9.4 Python

Fast and memory-efficient exact attention

The parallelization of the jobs is done on different axes: batch and attention head for the original flash attention, and Triton author added a third one, tokens, aka third dimension of Q (this important trick is now also part of flash attention CUDA implementation).
TensorRT

22 9,031 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train Your AI Model Once and Deploy on Any Cloud
3 projects | news.ycombinator.com | 8 Jul 2023
[P] Python library to optimize Hugging Face transformer for inference: < 0.5 ms latency / 2850 infer/sec
4 projects | /r/MachineLearning | 23 Nov 2021
Show HN: I created automatic subtitling app to boost short videos
1 project | news.ycombinator.com | 9 Apr 2024
LLMs on your local Computer (Part 1)
7 projects | dev.to | 11 Mar 2024
Voxos.ai – An Open-Source Desktop Voice Assistant
7 projects | news.ycombinator.com | 19 Jan 2024

[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Transformer Tensorrt openai Cuda Nvidia
Post date: 8 Feb 2023

kernl

whisper.cpp

WorkOS

whisper

flash-attention

TensorRT

InfluxDB

Related posts

[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Transformer Tensorrt openai Cuda Nvidia Post date: 8 Feb 2023

kernl

whisper.cpp

WorkOS

whisper

flash-attention

TensorRT

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Transformer Tensorrt openai Cuda Nvidia
Post date: 8 Feb 2023