-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Whisper cpp: https://github.com/ggerganov/whisper.cpp. Written in cpp. Super fast to boot up and run. Works on-device (e.g. a laptop or phone) since it's quantised and in cpp. Quoted as transcribing 1hr of audio in approx 8.5 minutes (so about 17x slower than Whisper JAX on TPU v4)
Original Whisper: https://github.com/openai/whisper. Baseline implementation
Hugging Face Whisper: https://huggingface.co/openai/whisper-large-v2#long-form-transcription. Uses an efficient batching algorithm to give a 7x speed-up on long-form audio samples. By far the easiest way of using Whisper: just pip install transformers and run it as per the code sample! No crazy dependencies, easy API, no extra optimisation packages, loads of documentation and love on GitHub ❤️. Compatible with fine-tuning if you want this!
Whisper JAX: https://github.com/sanchit-gandhi/whisper-jax. Builds on the Hugging Face implementation. Written in JAX (instead of PyTorch), where you get a 10x or more speed-up if you run it on TPU v4 hardware (I've gotten up to 15x with large batch sizes for super long audio files). Overall, 70-100x faster than OpenAI if you run it on TPU v4
Faster Whisper: https://github.com/guillaumekln/faster-whisper. 4x faster than original, also for short form audio samples. But no extra gains for long form on top of this
Whisper X: https://github.com/m-bain/whisperX. Uses Faster Whisper under-the-hood, so same speed-ups.
Related posts
-
Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller
-
Whisper.api: An open source, self-hosted speech-to-text with fast transcription
-
VLLM: 24x faster LLM serving than HuggingFace Transformers
-
Does openai whisper works on termux ?
-
Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST