SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python speech-recognition Projects
-
transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Project mention: The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing | dev.to | 2026-05-04Hugging Face Transformers: 500,000+ lines
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server | dev.to | 2026-05-09
Transcribes it locally using faster-whisper
-
Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6× faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.
-
FunASR
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15FunASR - Automatic Speech Recognition
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Star the Speech Brain repository ⭐
-
voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Project mention: Show HN: Likes/day as fake profile → built my own dating app in 100 days | news.ycombinator.com | 2025-12-16 -
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
SenseVoice
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
-
mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Project mention: The Free, Open-Source Alternative to ElevenLabs Is Finally Here | dev.to | 2026-05-24uv pip install "git+https://github.com/Blaizzy/mlx-audio" --prerelease=allow uv pip install soundfile
-
Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15
WeNet - Speech Recognition Toolkit
-
Project mention: Porcupine – On-device wake word detection powered by deep learning | news.ycombinator.com | 2026-03-05
-
Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.
A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".
https://github.com/yanshengjia/ml-road/blob/master/resources...
Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or even a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
-
Project mention: How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software | dev.to | 2026-02-28
Imagine owning a massive Plex media library with hundreds of foreign-language films and TV shows. You want subtitles for everything, but manually sourcing them is a nightmare — mismatched timings, missing translations, incomplete coverage. Tools like Bazarr exist specifically to automate subtitle management for Plex and Sonarr/Radarr libraries, and they ship with built-in integration for whisper-asr-webservice — a self-hosted REST API that wraps OpenAI's Whisper speech recognition model. Point Bazarr at a whisper-asr-webservice endpoint, and it will automatically transcribe and generate subtitles for every piece of media in your library, in any language Whisper supports.
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Project mention: Why removing 'um' from a recording is harder than it sounds | news.ycombinator.com | 2026-06-12Yeah, that's in faster-whisper-xxl via the --diarize parameter with additional options to tweak how it works:
https://github.com/Purfview/whisper-standalone-win/discussio...
I haven't used it when subtitling, though, so I don't know much more.
-
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
There is also: https://github.com/linto-ai/whisper-timestamped
It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
-
-
Python speech-recognition discussion
Python speech-recognition related posts
-
I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server
-
Deep Dive: OpenAI Whisper 2.0 vs. Deepgram 2.0 for Code Transcription in 2026
-
The Unit Economics of Speech-to-Text Just Collapsed
-
How to Migrate from Deprecated VAPI Transcriber Endpoints to Deepgram v2 in Retell AI Agents
-
mlx-audio: Speech Processing Library on Apple Silicon
-
I Built a Voice Assistant That Runs Entirely in Your Browser
-
How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software
-
A note from our sponsor - SaaSHub
www.saashub.com | 18 Jun 2026
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | transformers | 161,558 |
| 2 | faster-whisper | 23,557 |
| 3 | whisperX | 22,445 |
| 4 | FunASR | 17,920 |
| 5 | PaddleSpeech | 12,614 |
| 6 | speechbrain | 11,610 |
| 7 | voice-pro | 10,931 |
| 8 | espnet | 9,858 |
| 9 | SpeechRecognition | 8,970 |
| 10 | SenseVoice | 8,601 |
| 11 | mlx-audio | 7,345 |
| 12 | wenet | 5,138 |
| 13 | Porcupine | 4,851 |
| 14 | ml-road | 4,825 |
| 15 | distil-whisper | 4,065 |
| 16 | whisper-asr-webservice | 3,281 |
| 17 | whisper-standalone-win | 3,080 |
| 18 | lingvo | 2,864 |
| 19 | whisper-timestamped | 2,818 |
| 20 | lip-reading-deeplearning | 1,903 |
| 21 | kalliope | 1,753 |
| 22 | SALMONN | 1,449 |
| 23 | SpeechT5 | 1,440 |