whisper.cpp
faster-whisper
whisper.cpp | faster-whisper | |
---|---|---|
199 | 25 | |
40,569 | 16,493 | |
2.5% | 4.7% | |
9.9 | 7.5 | |
8 days ago | 11 days ago | |
C++ | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
whisper.cpp
-
Ask HN: What API or software are people using for transcription?
Whisper large v3 from openai, but we host it ourselves on Modal.com. It's easy, fast, no rate limits, and cheap as well.
If you want to run it locally, I'd still go with whisper, then I'd look at something like whisper.cpp https://github.com/ggml-org/whisper.cpp. Runs quite well.
- Whispercpp – Local, Fast, and Private Audio Transcription for Ruby
-
Build Your Own Siri. Locally. On-Device. No Cloud
not the gp but found this https://github.com/ggml-org/whisper.cpp/blob/master/models/c...
-
Run LLMs on Apple Neural Engine (ANE)
Actually that's a really good question, I hadn't considered that the comparison here is just CPU vs using Metal (CPU+GPU).
To answer the question though - I think this would be used for cases where you are building an app that wants to utilize a small AI model while at the same time having the GPU free to do graphics related things, which I'm guessing is why Apple stuck these into their hardware in the first place.
Here is an interesting comparison between the two from a whisper.cpp thread - ignoring startup times - the CPU+ANE seems about on par with CPU+GPU: https://github.com/ggml-org/whisper.cpp/pull/566#issuecommen...
-
Building a personal, private AI computer on a budget
A great thread with the type of info your looking for lives here: https://github.com/ggerganov/whisper.cpp/issues/89
But you can likely find similar threads for the llama.cpp benchmark here: https://github.com/ggerganov/llama.cpp/tree/master/examples/...
These are good examples because the llama.cpp and whisper.cpp benchmarks take full advantage of the Apple hardware but also take full advantage of non-Apple hardware with GPU support, AVX support etc.
It’s been true for a while now that the memory bandwidth of modern Apple systems in tandem with the neural cores and gpu has made them very competitive Nvidia for local inference and even training.
- Whisper.cpp: Looking for Maintainers
- Show HN: Galene-stt: automatic captioning for the Galene videconferencing system
-
Show HN: Transcribe YouTube Videos
Not as convenient, but you could also have the user manually install the model, like whisper does.
Just forward the error message output by whisper, or even make a more user-friendly error message with instructions on how/where to download the models.
Whisper does provide a simple bash script to download models: https://github.com/ggerganov/whisper.cpp/blob/master/models/...
(As a Windows user, I can run bash scripts via Git Bash for Windows[1])
[1]: https://git-scm.com/download/win
- OTranscribe: A free and open tool for transcribing audio interviews
-
Show HN: I created automatic subtitling app to boost short videos
whisper.cpp [1] has a karaoke example that uses ffmpeg's drawtext filter to display rudimentary karaoke-like captions. It also supports diarisation. Perhaps it could be a starting point to create a better script that does what you need.
--
1: https://github.com/ggerganov/whisper.cpp/blob/master/README....
faster-whisper
-
Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model
Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :
https://github.com/SYSTRAN/faster-whisper (speech-to-text)
-
Self-hosted offline transcription and diarization service with LLM summary
I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
Faster-whisper (https://github.com/SYSTRAN/faster-whisper)
-
Using Groq to Build a Real-Time Language Translation App
For our real-time STT needs, we'll employ a fantastic library called faster-whisper.
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
Thermostats: https://www.sinopetech.com/en/products/thermostat/
I haven't tried running a local text-to-speech engine backed by an LLM to control Home Assistant. Maybe someone is working on this already?
TTS: https://github.com/SYSTRAN/faster-whisper
LLM: https://github.com/Mozilla-Ocho/llamafile/releases
LLM: https://huggingface.co/TheBloke/Nous-Hermes-2-Mixtral-8x7B-D...
It would take some tweaking to get the voice commands working correctly.
-
Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX
Could someone elaborate how is this accomplished and is there any quality disparity compared to original whisper?
Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense about why it's faster than the original, but this one, not so much, especially considering it's even much faster.
-
Now I Can Just Print That Video
Cool! I had the same project idea recently. You may be interested in this for the step of speech2text: https://github.com/SYSTRAN/faster-whisper
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
That's the implication. If the distil models are same format as original openai models then the Distil models can be converted for faster-whisper use as per the conversion instructions on https://github.com/guillaumekln/faster-whisper/
So then we'll see whether we get the 6x model speedup on top of the stated 4x faster-whisper code speedup.
- AMD May Get Across the CUDA Moat
-
Open Source Libraries
guillaumekln/faster-whisper
What are some alternatives?
bark - 🔊 Text-Prompted Generative Audio Model
whisperX - WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
ROCm - AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]