SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python speech-recognition Projects
-
Play around with OpenAI’s GPT models and Hugging Face's Transformers library.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Project mention: WhisperX: Precise ASR with Word-Level Timestamps and Diarization | news.ycombinator.com | 2024-09-05
-
Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26
I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Simple Diarizer Simple Diarizer is a speaker diarization library that utilizes pretrained models from SpeechBrain . To get started with simple_diarizer, follow these steps:
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Start and Stop Listening Example
-
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17
You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
-
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Project mention: FunASR: Fundamental End-to-End Speech Recognition Toolkit | news.ycombinator.com | 2024-01-13 -
-
-
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Details will be shared tomorrow, but from what I have read they have distilled the large model decoder into this turbo that only has 4 layers instead of 32, the encoder should remain the same size. Similar to https://github.com/huggingface/distil-whisper but the model is distilled using multilingual data instead of just English, and the decoder is 4 layers instead of 2.
-
-
-
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Project mention: Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old | news.ycombinator.com | 2024-02-28Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
-
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
-
-
-
whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
On Windows I use whisper-standalone-win: https://github.com/Purfview/whisper-standalone-win
It has a few customization features that are nice: https://github.com/Purfview/whisper-standalone-win/discussio...
Works miles better than plain faster-whisper, in my experience. Not sure if there's wildcard support but that's easily scripted.
-
-
-
kaldi-gstreamer-server
Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
-
quillman
A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.
https://github.com/modal-labs/quillman
I built something similar using WebKit speech recognition but that's limited to Chromium.
-
Project mention: Comparing Humans, GPT-4, and GPT-4V on Abstraction and Reasoning Tasks | news.ycombinator.com | 2023-11-19
> In other words, if you express a problem in a more complicated space (e.g. a visual problem, or an abstract algebra problem), you will not be able to solve it in the smaller token space, there's not enough information
You're aware multimodel transformers do exactly this?
https://github.com/bytedance/SALMONN
Python speech-recognition discussion
Python speech-recognition related posts
-
Show HN: Auto Transcribe/Subtitle Videos – Python OpenAI Whisper Wrapper and GUI
-
New OpenAI Whisper model: "turbo"
-
WhisperX: Precise ASR with Word-Level Timestamps and Diarization
-
WhisperX: Precise ASR with Word-Level Timestamps and Speaker Diarization
-
OctoTube: Voice Search for YouTube
-
OTranscribe: A free and open tool for transcribing audio interviews
-
StreamSpeech: "All in One" model for simultaneous ASR, translation and TTS
-
A note from our sponsor - SaaSHub
www.saashub.com | 4 Oct 2024
Index
What are some of the best open-source speech-recognition projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 132,909 |
2 | whisperX | 11,720 |
3 | faster-whisper | 11,628 |
4 | PaddleSpeech | 10,970 |
5 | speechbrain | 8,624 |
6 | SpeechRecognition | 8,356 |
7 | espnet | 8,346 |
8 | FunASR | 6,179 |
9 | wenet | 4,085 |
10 | Porcupine  | 3,706 |
11 | distil-whisper | 3,538 |
12 | lingvo | 2,811 |
13 | whisper-asr-webservice | 1,988 |
14 | whisper-timestamped | 1,913 |
15 | lip-reading-deeplearning | 1,820 |
16 | kalliope | 1,711 |
17 | Dragonfire | 1,384 |
18 | whisper-standalone-win | 1,188 |
19 | SpeechT5 | 1,163 |
20 | SincNet | 1,115 |
21 | kaldi-gstreamer-server | 1,069 |
22 | quillman | 1,030 |
23 | SALMONN | 996 |