faster-whisper
whisperX
faster-whisper | whisperX | |
---|---|---|
25 | 34 | |
17,074 | 16,683 | |
5.0% | 4.2% | |
7.5 | 8.8 | |
about 1 month ago | 15 days ago | |
Python | Python | |
MIT License | BSD 2-clause "Simplified" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
faster-whisper
-
Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model
Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :
https://github.com/SYSTRAN/faster-whisper (speech-to-text)
-
Self-hosted offline transcription and diarization service with LLM summary
I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
Faster-whisper (https://github.com/SYSTRAN/faster-whisper)
-
Using Groq to Build a Real-Time Language Translation App
For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
Thermostats: https://www.sinopetech.com/en/products/thermostat/
I haven't tried running a local text-to-speech engine backed by an LLM to control Home Assistant. Maybe someone is working on this already?
TTS: https://github.com/SYSTRAN/faster-whisper
LLM: https://github.com/Mozilla-Ocho/llamafile/releases
LLM: https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-D...
It would take some tweaking to get the voice commands working correctly.
-
Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX
Could someone elaborate how is this accomplished and is there any quality disparity compared to original whisper?
Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense about why it's faster than the original, but this one, not so much, especially considering it's even much faster.
-
Now I Can Just Print That Video
Cool! I had the same project idea recently. You may be interested in this for the step of speech2text: https://github.com/SYSTRAN/faster-whisper
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/
So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup.
- AMD May Get Across the CUDA Moat
-
Open Source Libraries
guillaumekln/faster-whisper
whisperX
-
Ask HN: What API or software are people using for transcription?
I use whisperfile[1] directly. The whisper-large-v3 model seems good with non-English transcription, which is my main use-case.
I am also eyeing whisperX[2], because I want to play some more with speaker diarization.
Your use-case seems to be batch transcription, so I'd suggest you go ahead and just use whisperfile, it should work well on an M4 mini, and it also has an HTTP API if you just start it without arguments.
If you want more interactivity, I have been using Vibe[3] as an open-source replacement of SuperWhisper[4], but VoiceInk from a sibling comment seems better.
Aside: It seems that so many of the mentioned projects use whisper at the core, that it would be interesting to explicitly mark the projects that don't use whisper, so we can have a real fundamental comparison.
[1] https://huggingface.co/Mozilla/whisperfile
[2] https://github.com/m-bain/whisperX
[3] https://github.com/thewh1teagle/vibe/
[4] https://superwhisper.com/
-
Ask HN: Is Whisper Still Relevant?
Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX
-
Show HN: Mikey – No bot meeting notetaker for Windows
https://github.com/m-bain/whisperX looks promising - I'm hacking away on an always-on transcriber for my notes for later search&recall. It has support for diarization (the speaker detection you're looking for).
I'm currently hacking away on a mix of https://github.com/speaches-ai/speaches + https://github.com/ufal/whisper_streaming though - mostly because my laptop doesn't have a decent GPU, I stream the audio to a home server instead.
But overall it's pretty simple to do after you wrangle the Python dependencies - all you need is a sink for the text files (for example, create a new file for every Teams meeting, but that's another story...)
-
VLC tops 6B downloads, previews AI-generated subtitles
You don't need to wait, you can use: https://github.com/m-bain/whisperX right now for STT with timestamps and diarization.
-
Transcriber AI – Free, end-to-end machine based transcription with speaker id
I use whisper and pyannote (https://github.com/m-bain/whisperX), but it is a pain to run locally - I run it on a 4080. This seems to be actually trying to identify the speakers. Not sure what they are doing for that.
-
Supercharge Your AI Skills: 5 Open Source Repositories You Can't Afford to Miss
3. WhisperX
-
Show HN: Offline audiobook from any format with one CLI command
> And do you know a good speech to text model?
OpenAI's whisper, code+model are available, and multiple projects have built on it. You could try this wrapper: https://github.com/m-bain/whisperX -- or for short utterances on a smart-phone https://github.com/futo-org/whisper-acft
- WhisperX: Precise ASR with Word-Level Timestamps and Diarization
- WhisperX: Precise ASR with Word-Level Timestamps and Speaker Diarization
- Text-to-Speech with Speaker Diarization
What are some alternatives?
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
whisper.cpp - Port of OpenAI's Whisper model in C/C++
ROCm - AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]
openai-whisper-cpu - Improving transcription performance of OpenAI Whisper for CPU based deployment