SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Python speech-to-text Projects
-
Project mention: I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server | dev.to | 2026-05-09
Transcribes it locally using faster-whisper
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6Γ faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.
-
-
Star the Speech Brain repository β
-
voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Project mention: Show HN: Likes/day as fake profile β built my own dating app in 100 days | news.ycombinator.com | 2025-12-16 -
RealtimeSTT
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
SenseVoice
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
-
mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Project mention: The Free, Open-Source Alternative to ElevenLabs Is Finally Here | dev.to | 2026-05-24uv pip install "git+https://github.com/Blaizzy/mlx-audio" --prerelease=allow uv pip install soundfile
-
-
Project mention: How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software | dev.to | 2026-02-28
Imagine owning a massive Plex media library with hundreds of foreign-language films and TV shows. You want subtitles for everything, but manually sourcing them is a nightmare β mismatched timings, missing translations, incomplete coverage. Tools like Bazarr exist specifically to automate subtitle management for Plex and Sonarr/Radarr libraries, and they ship with built-in integration for whisper-asr-webservice β a self-hosted REST API that wraps OpenAI's Whisper speech recognition model. Point Bazarr at a whisper-asr-webservice endpoint, and it will automatically transcribe and generate subtitles for every piece of media in your library, in any language Whisper supports.
-
LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Project mention: Why removing 'um' from a recording is harder than it sounds | news.ycombinator.com | 2026-06-12Yeah, that's in faster-whisper-xxl via the --diarize parameter with additional options to tweak how it works:
https://github.com/Purfview/whisper-standalone-win/discussio...
I haven't used it when subtitling, though, so I don't know much more.
-
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
There is also: https://github.com/linto-ai/whisper-timestamped
It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.
-
-
-
airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
-
whisper-ctranslate2
Whisper command line client compatible with original OpenAI client based on CTranslate2.
-
StreamSpeech
StreamSpeech is an βAll in Oneβ seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
-
-
-
Python speech-to-text discussion
Python speech-to-text related posts
-
Why removing 'um' from a recording is harder than it sounds
-
I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server
-
Stop paying for AI transcription! ποΈ WritHer: 100% Local Voice Assistant for Windows. Privacy-first, Whisper + Ollama powered. Open Source on GitHub!
-
The Unit Economics of Speech-to-Text Just Collapsed
-
Show HN: I built a sub-500ms latency voice agent from scratch
-
I Built a Voice Assistant That Runs Entirely in Your Browser
-
I Made Tkinter Look Like a Modern Glassmorphic App β Here's the Dark Magic I Used
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 Jun 2026
Index
What are some of the best open-source speech-to-text projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | faster-whisper | 23,557 |
| 2 | whisperX | 22,445 |
| 3 | pyvideotrans | 17,918 |
| 4 | speechbrain | 11,610 |
| 5 | voice-pro | 10,931 |
| 6 | RealtimeSTT | 9,874 |
| 7 | SpeechRecognition | 8,970 |
| 8 | SenseVoice | 8,522 |
| 9 | mlx-audio | 7,345 |
| 10 | speech-to-speech | 4,873 |
| 11 | whisper-asr-webservice | 3,281 |
| 12 | LLaMA-Omni | 3,141 |
| 13 | whisper-standalone-win | 3,055 |
| 14 | lingvo | 2,864 |
| 15 | whisper-timestamped | 2,818 |
| 16 | kalliope | 1,753 |
| 17 | Dragonfire | 1,409 |
| 18 | airunner | 1,315 |
| 19 | whisper-ctranslate2 | 1,314 |
| 20 | StreamSpeech | 1,270 |
| 21 | quillman | 1,205 |
| 22 | dc_tts | 1,158 |
| 23 | AI-Waifu-Vtuber | 1,062 |