[D] What is the most efficient version of OpenAI Whisper?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • Whisper cpp: https://github.com/ggerganov/whisper.cpp. Written in cpp. Super fast to boot up and run. Works on-device (e.g. a laptop or phone) since it's quantised and in cpp. Quoted as transcribing 1hr of audio in approx 8.5 minutes (so about 17x slower than Whisper JAX on TPU v4)

  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • Original Whisper: https://github.com/openai/whisper. Baseline implementation

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Hugging Face Whisper: https://huggingface.co/openai/whisper-large-v2#long-form-transcription. Uses an efficient batching algorithm to give a 7x speed-up on long-form audio samples. By far the easiest way of using Whisper: just pip install transformers and run it as per the code sample! No crazy dependencies, easy API, no extra optimisation packages, loads of documentation and love on GitHub ❤️. Compatible with fine-tuning if you want this!

  • whisper-jax

    JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

  • Whisper JAX: https://github.com/sanchit-gandhi/whisper-jax. Builds on the Hugging Face implementation. Written in JAX (instead of PyTorch), where you get a 10x or more speed-up if you run it on TPU v4 hardware (I've gotten up to 15x with large batch sizes for super long audio files). Overall, 70-100x faster than OpenAI if you run it on TPU v4

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • Faster Whisper: https://github.com/guillaumekln/faster-whisper. 4x faster than original, also for short form audio samples. But no extra gains for long form on top of this

  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • Whisper X: https://github.com/m-bain/whisperX. Uses Faster Whisper under-the-hood, so same speed-ups.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller

    14 projects | news.ycombinator.com | 31 Oct 2023
  • Whisper.api: An open source, self-hosted speech-to-text with fast transcription

    5 projects | news.ycombinator.com | 22 Aug 2023
  • VLLM: 24x faster LLM serving than HuggingFace Transformers

    3 projects | news.ycombinator.com | 20 Jun 2023
  • Does openai whisper works on termux ?

    2 projects | /r/termux | 26 May 2023
  • Show HN: Willow Inference Server: Optimized ASR/TTS/LLM for Willow/WebRTC/REST

    3 projects | news.ycombinator.com | 23 May 2023