SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Asr Projects
-
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
NVIDIA NeMo To perform speaker diarization using NVIDIA NeMo , follow these steps:
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:
-
Project mention: Omni SenseVoice: High-Speed Speech Recognition with Words Timestamps | news.ycombinator.com | 2024-10-12
I mean they make a bold statement up top just to paddle back a little bit further down with: "[…] In terms of Chinese and Cantonese recognition, the SenseVoice-Small model has advantages."
It feels dishonest to me.
[0] https://github.com/FunAudioLLM/SenseVoice?tab=readme-ov-file...
-
nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
First, I had to get the data. I thought this would be a good place to use Google Cloud API which can extract Youtube transcripts. But while setting up this service, I realized there was an even easier way; by extracting the auto-generated captions from YouTube. (Thankyou _jdepoix _for the library to do so, https://github.com/jdepoix/youtube-transcript-api).
-
-
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win
It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...
Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.
-
-
StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?
* https://github.com/ictnlp/StreamSpeech
* https://github.com/k2-fsa/sherpa-onnx
* https://github.com/openai/whisper
I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.
RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.
Any suggestions?
-
-
-
whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
-
CrisperWhisper
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Project mention: CrisperWhisper: Automatic Speech Recognition with improved word-level timestamps | news.ycombinator.com | 2024-11-22 -
-
AutoSub
A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui (by abhirooptalasila)
-
-
For those interested in exploring or extracting YouTube transcripts for their own projects, tools like Transcriptly and Rev.com offer additional functionality, such as downloading, editing, and translating captions.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Asr discussion
Python Asr related posts
-
The Technology Behind YouTube’s Auto-Captioning System
-
Show HN: Mikey – No bot meeting notetaker for Windows
-
Ask HN: Is Whisper Still Relevant?
-
Transcriber AI – Free, end-to-end machine based transcription with speaker id
-
Supercharge Your AI Skills: 5 Open Source Repositories You Can't Afford to Miss
-
Benchmark GGUF models with a one line of code
-
Benchmark GGUF models with a ONE line of code
-
A note from our sponsor - SaaSHub
www.saashub.com | 14 May 2025
Index
What are some of the best open-source Asr projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | whisperX | 15,599 |
2 | NeMo | 14,217 |
3 | PaddleSpeech | 11,875 |
4 | speechbrain | 9,808 |
5 | SenseVoice | 5,578 |
6 | nexa-sdk | 4,534 |
7 | wenet | 4,490 |
8 | youtube-transcript-api | 3,876 |
9 | lingvo | 2,838 |
10 | whisper-asr-webservice | 2,579 |
11 | whisper-timestamped | 2,393 |
12 | whisper-standalone-win | 2,040 |
13 | SincNet | 1,171 |
14 | StreamSpeech | 1,071 |
15 | vosk-server | 1,042 |
16 | pykaldi | 1,015 |
17 | whisper.api | 880 |
18 | CrisperWhisper | 696 |
19 | cheetah | 622 |
20 | AutoSub | 595 |
21 | pyannote-whisper | 586 |
22 | leopard | 455 |
23 | reverb | 402 |