SaaSHub helps you find the best software and product alternatives Learn more β
whisperX Alternatives
Similar projects and alternatives to whisperX
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Nim
Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
transcribe-anything
Multi-backend whisper app. Blazing fast. Mac-arm optimized. Easy install. Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free π€―π€―π€―
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
-
-
whisperX discussion
whisperX reviews and mentions
-
The Unit Economics of Speech-to-Text Just Collapsed
Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6Γ faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.
-
Dell's CES 2026 chat was the most pleasingly un-AI briefing I've had in 5 years
Yes, if you check their community integrations section on faster-whisper [0], you can see a lot of different CLIs, GUIs, and libraries. I recommend WhisperX [1], it's the most complete CLI so far and has features like diarization which whisper.cpp does not have in a production-ready capacity.
[0] https://github.com/SYSTRAN/faster-whisper#community-integrat...
[1] https://github.com/m-bain/whisperX
-
A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate
The whisperx-a40-large model is an accelerated version of the popular Whisper automatic speech recognition (ASR) model. Developed by Victor Upmeet, it provides fast transcription with word-level timestamps and speaker diarization. This model builds upon the capabilities of Whisper, which was originally created by OpenAI, and incorporates optimizations from the WhisperX project for improved performance.
-
Go Away Python
I've actually done a fair bit of ML work in Elixir, in practice I found:
1) It's generally harder to interface with existing libraries and models (example: whisperX [0] is a library that combines generic whisper speech recognition models with some additional tools like discrete-time-warping to create a transcription with more accurate time stamp alignment - something that was very helpful when generating subtitles. But because most of this logic just lives in the python library, using this in Elixir requires writing a lot more tooling around the existing bumblebee whisper implementation [1]).
but,
2) It's way easier to ship models I built and trained entirely with Elixir's ML ecosystem - EXLA, NX, Bumblebee. I trained a few models doing basic visual recognition tasks (detecting scene transitions, credits, title cards, etc), using the existing CLIP model as a visual frontend and then training a small classifier on the output of CLIP. It was pretty straightforward to do with Elixir, and I love that I can run the same exact code on my laptop and server without dealing with lots of dependencies and environment issues.
Livebook is also incredibly nice, my typical workflow has become prototyping things in Livebook with some custom visualization tools that I made and then just connecting to a livebook instance running on EC2 to do the actual training run. From there shipping and using the model is seamless, and I just publish the wrapping module as a library on our corporate github, which lets anyone else import it straight into livebook and use it.
[0] https://github.com/m-bain/whisperX
[1] https://hexdocs.pm/bumblebee/Bumblebee.Audio.Whisper.html
-
Making AI Models Faster, Cheaper, and Greener β Hereβs How
2.3X speed improvement over WhisperX and a 3X speed boost compared to HuggingFace Pipeline with FlashAttention 2 (Insanely Fast Whisper)
- FFmpeg 8.0 adds Whisper support
-
Ask HN: What Speaker Diarization tools should I look into?
I am building VideoToBe.com - I have found that whisperX works the most reliable.
https://github.com/m-bain/whisperX
It is built on top of OpenAI Whisper, so speech recognition is good, the transcript gives speaker tags as 'SPEAKER_00' and 'SPEAKER_01' etc.
Here is how the transcript may look like
https://videotobe.com/play/media/1b02f75a-9503-43aa-8956-d18...
-
Ask HN: What API or software are people using for transcription?
I use whisperfile[1] directly. The whisper-large-v3 model seems good with non-English transcription, which is my main use-case.
I am also eyeing whisperX[2], because I want to play some more with speaker diarization.
Your use-case seems to be batch transcription, so I'd suggest you go ahead and just use whisperfile, it should work well on an M4 mini, and it also has an HTTP API if you just start it without arguments.
If you want more interactivity, I have been using Vibe[3] as an open-source replacement of SuperWhisper[4], but VoiceInk from a sibling comment seems better.
Aside: It seems that so many of the mentioned projects use whisper at the core, that it would be interesting to explicitly mark the projects that don't use whisper, so we can have a real fundamental comparison.
[1] https://huggingface.co/Mozilla/whisperfile
[2] https://github.com/m-bain/whisperX
[3] https://github.com/thewh1teagle/vibe/
[4] https://superwhisper.com/
-
Ask HN: Is Whisper Still Relevant?
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
Show HN: Mikey β No bot meeting notetaker for Windows
https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.
But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
-
A note from our sponsor - SaaSHub
www.saashub.com | 11 Jun 2026
Stats
m-bain/whisperX is an open source project licensed under BSD 2-clause "Simplified" License which is an OSI approved license.
The primary programming language of whisperX is Python.