InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python speech-recognition Projects
-
Project mention: None of the top 10 projects in GitHub is actually a software project 🤯 | dev.to | 2025-05-10
We see an addition to the AI community with AutoGPT. Along with Tensorflow they represent the AI community in the software category, which is getting relevant (2 out of 8). We can expect in the future to have new AI projects in the top 25 such as Transformers or Ollama (currently top 34 and 36, respectively).
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14
Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :
https://github.com/SYSTRAN/faster-whisper (speech-to-text)
-
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12Apparently not. See https://github.com/lifeiteng/OmniSenseVoice/blob/main/src/om.... See also FunASR running SenseVoice but using Kaldi for speaker identification https://github.com/modelscope/FunASR/blob/cd684580991661b9a0...
-
Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12
I mean they make a bold statement up top just to paddle back a little bit further down with: "[…] In terms of Chinese and Cantonese recognition, the SenseVoice-Small model has advantages."
It feels dishonest to me.
[0] https://github.com/FunAudioLLM/SenseVoice?tab=readme-ov-file...
-
-
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.
-
voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool 🔊 | dev.to | 2025-02-10GitHub: https://github.com/abus-aikorea/voice-pro
-
-
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win
It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...
Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-recognition discussion
Python speech-recognition related posts
-
Run LLMs on Apple Neural Engine (ANE)
-
The Technology Behind YouTube’s Auto-Captioning System
-
Amazon Is Discontinuing the "Do Not Send Voice Recordings" Feature on Echo
-
The FFT Strikes Back: An Efficient Alternative to Self-Attention
-
Ask HN: Is Whisper Still Relevant?
-
Transcriber AI – Free, end-to-end machine based transcription with speaker id
-
Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds
-
A note from our sponsor - InfluxDB
www.influxdata.com | 12 May 2025
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | transformers | 144,045 |
2 | faster-whisper | 15,829 |
3 | whisperX | 15,599 |
4 | PaddleSpeech | 11,853 |
5 | FunASR | 10,298 |
6 | speechbrain | 9,777 |
7 | espnet | 9,067 |
8 | SpeechRecognition | 8,712 |
9 | SenseVoice | 5,578 |
10 | wenet | 4,490 |
11 | Porcupine  | 4,088 |
12 | distil-whisper | 3,845 |
13 | voice-pro | 3,647 |
14 | lingvo | 2,838 |
15 | whisper-asr-webservice | 2,566 |
16 | whisper-timestamped | 2,393 |
17 | whisper-standalone-win | 2,040 |
18 | lip-reading-deeplearning | 1,870 |
19 | kalliope | 1,728 |
20 | Dragonfire | 1,383 |
21 | SpeechT5 | 1,336 |
22 | SALMONN | 1,220 |
23 | SincNet | 1,171 |