Our great sponsors
-
Whisper
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model (by Const-me)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
> What does this mean?
It means you flipped the combobox on the first screen. In the build on github, the only included model implementation is GPU. The other two implementations are disabled with macros, there: https://github.com/Const-me/Whisper/blob/1.1.0/Whisper/stdaf... These implementations are lacking some UX features like callbacks and cancellation, and I haven't tested them for a while, but they might still work.
> does this use both GPU and CPU simultaneously?
No, it's sequential. There's a data dependency between these two stages. The encode function computes some buffers (probably called "cross attention" but I'm not sure, not an ML expert), and then the decode function needs that data to generate the output text.
This project is a Windows port of the whisper.cpp implementation: https://github.com/ggerganov/whisper.cpp
Which in turn is a C++ port of OpenAI's Whisper automatic speech recognition (ASR) model: https://github.com/openai/whisper
The implementation has no dependencies, usually much faster than realtime, and should hopefully work on most Windows computers in the world.
This project is a Windows port of the whisper.cpp implementation: https://github.com/ggerganov/whisper.cpp
Which in turn is a C++ port of OpenAI's Whisper automatic speech recognition (ASR) model: https://github.com/openai/whisper
The implementation has no dependencies, usually much faster than realtime, and should hopefully work on most Windows computers in the world.