Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
buzz
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
It runs locally, using Whisper.cpp[1], a Whisper implementation optimized to run on CPU, especially Apple Silicon.
Whisper itself is open source, and so is that implementation, the OpenAI endpoint is merely a convenience to those who don't wish to host a Whisper server themselves, deal with batching, renting GPUs etc. If you're making a commercial service based on Whisper, the API might be worth it for the convenience, but if you're running it personally and have a good enough machine (an M1 MacBook Air will do), running it locally is usually better.
[1] https://github.com/ggerganov/whisper.cpp
Here's a multi-platform open source app that does the same thing but uses vosk instead of whisper.
https://github.com/bugbakery/audapolis
I have a Python script on my mac that detects when I press-and-hold the right option key, and records audio while it's pressed. On release, it transcribes it with whispercpp and pastes it. Makes it very easy to record quick voice notes. Here it is: https://github.com/corlinp/whisperer
I was working on a native version in the form of a taskbar app with customizable prompt and all. However I quickly realized that the behaviors I want the app to do require a bunch of accessibility permissions that would block it from the app store and require more setup steps.
Would anybody still find that useful?
There is a great library that has support not only with OpenAIs whisper but many others that also work offline. https://github.com/Uberi/speech_recognition
https://github.com/MahmoudAshraf97/whisper-diarization
This project has been alright for transcribing audio with speaker diarization. A big finicky. The OpenAI model is better than other paid products(Descript, Riverside) so I’m looking forward to trying MacWhisper.
Shameless plug: recently launched LLMStack (https://github.com/trypromptly/LLMStack) and I have some custom pipelines built as apps on LLMStack that I use to transcribe and translate.
Granted my use cases are not high volume or frequent but being able to take output from Whisper and pipe it to other models has been very powerful for me. It is also amazing how good the quality of Whisper is when handling non English audio.
We added LocalAI (https://localai.io) support to LLMStack in the last release. Will try to use whisper.cpp and see how that compares for my use cases.
The OpenAi CLI does that, follow the instructions https://github.com/openai/whisper