Show HN: Port of OpenAI's Whisper model in C/C++

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper.cpp

187 30,942 9.8 C

Port of OpenAI's Whisper model in C/C++

Hi HN,
OpenAI recently released a model for automatic speech recognition called Whisper [0]. I decided to reimplement the inference of the model from scratch using C/C++. To achieve this I implemented a minimalistic tensor library in C and ported the high-level architecture of the model in C++. The entire code is less than 8000 lines of code and is contained in just 2 source files without any third-party dependencies. The Github project is here:
https://github.com/ggerganov/whisper.cpp
With this implementation I can very easily build and run the model - “make base.en”. It also allows me to run it on a wide range of devices. For example, I have provided examples of running the model on an iPhone, Raspberry Pi 4 and even in a web page via WebAssembly!
The implementation runs fully on the CPU and utilizes FP16, AVX intrinsics on x86 architectures and NEON + Accelerate framework on Apple Silicon. The latter is especially efficient and I observe that the inference is about 2-3 times faster compared to the current PyTorch implementation provided by OpenAI when running it on my MacBook M1 Pro. The WASM port utilizes SIMD 128-bit intrinsics - a feature supported in some modern web browsers [1].
I am very happy with the performance that I observe on Apple Silicon devices. I didn’t expect that the Accelerate framework [2] (i.e. CBLAS) offers such a dramatic performance boost for matrix multiplications so I was very pleasantly surprised! To enable the framework in your C/C++ projects, all you have to do is add `-framework Accelerate` to your clang command-line flags.
This entire exercise of implementing the Whisper model was very interesting to me and helped me understand a lot about how the transformer architecture works. I also got a lot of positive feedback from people finding and using my project. We brainstormed on a lot of interesting tools that can potentially be created with this library (such as speech-to-text plugin for Vim, RPi4 voice assistant, WASM chat bot, etc). If interested, checkout the “Examples” section and the “Show and tell” discussions for some ideas!
Would love to know what you think about this project and about your experience with using the Accelerate framework in any of your projects.

whisper

343 59,916 6.8 Python

Robust Speech Recognition via Large-Scale Weak Supervision
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Enzyme

16 1,153 9.6 LLVM

High-performance automatic differentiation of LLVM and MLIR. (by EnzymeAD)

https://ispc.github.io/ispc.html
For the auto-differentiation when I need performance or memory, I currently use tapenade ( http://tapenade.inria.fr:8080/tapenade/index.jsp ) and/or manually written gradient when I need to fuse some kernel, but Enzyme ( https://enzyme.mit.edu/ ) is also very promising.
MPI for parallelization across machines.

amx

18 843 4.1 C

Apple AMX Instruction Set

You are correct, in that those are the four
My understanding is that the AMX is more tightly wound with the CPU, ultimately being accessible via an instruction set (https://github.com/corsix/amx), and it is useful if you need to do matrix multiplications interleaved with other CPU tasks. A common example would be a VIO loop or something where you want that data in the CPU caches.
The GPU and Neural Engine are not that – they take some time to set up and initialize. They also can parallelize tasks to a much higher degree. The GPU is more generalizable, because you can write compute shaders to do anything in parallel, but it uses a lot of resources. I'll have to check out the PR to see how exactly the MPS shaders match up with the task at hand, because you could also consider writing Metal compute shaders by hand.
I know the least about the ANE, but it has specific hardware for running ML models, and you have to process the weights ahead of time to make sure they are in the right format. It can run ML models very efficiently and is the most battery friendly.

Halide

43 5,700 9.5 C++

a language for fast, portable data-parallel computation

I suggest looking into Halide as it will make trying different paths much easier (https://halide-lang.org/).
I haven't looked at your code closely so can't say with certainty it would be the right fit but worth a look.

whisper.cpp

1 4 10.0 C

Port of OpenAI's Whisper model in C/C++ (by Const-me)

Feel free to merge my fork, about 20% faster on my computer (Ryzen 7 5700G CPU, medium.en model): https://github.com/Const-me/whisper.cpp It also contains VS2022 projects to build on Windows, your cmake project results in disabled AVX which is critical for performance.
Also, I didn’t really understand your multithreading code in ggml_graph_compute function, but that custom thread pool implementation IMO looks suspicious. Just too many atomics. Might be possible to improve a lot with a better multithreading strategy.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Flash Attention in ~100 lines of CUDA
2 projects | news.ycombinator.com | 16 Mar 2024
Halide v17.0.0
1 project | news.ycombinator.com | 1 Feb 2024
Implementing Mario's Stack Blur 15 times in C++ (with tests and benchmarks)
1 project | news.ycombinator.com | 10 Nov 2023
Deepmind Alphadev: Faster sorting algorithms discovered using deep RL
3 projects | news.ycombinator.com | 7 Jun 2023
Blog Post: Can You Trust a Compiler to Optimize Your Code?
1 project | /r/rust | 9 Apr 2023

Show HN: Port of OpenAI's Whisper model in C/C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Compiler halide Enzyme openai hexagon
Post date: 6 Dec 2022

whisper.cpp

whisper

InfluxDB

Enzyme

amx

Halide

whisper.cpp

Related posts

Show HN: Port of OpenAI's Whisper model in C/C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Compiler halide Enzyme openai hexagon Post date: 6 Dec 2022

whisper.cpp

whisper

InfluxDB

Enzyme

amx

Halide

whisper.cpp

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Compiler halide Enzyme openai hexagon
Post date: 6 Dec 2022