Faster Whisper Transcription with CTranslate2

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tinydiarize

    Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

  • Not that I can see, the developer's roadmap[1] currently is at writing a blogpost about it & trying different sampling methods, expanding to the large model looks to be a long way off (which is a real pity, even if there was just a finetune of large for english that would be a big help over the existing small english finetune).

    Could you go into more detail about your workflow? I'd been considering a two-pass approach myself until I discovered tinydiarize mentioned in whisper.cpp's --help text

    1: https://github.com/akashmjn/tinydiarize#roadmap

  • faster-whisper

    Faster Whisper transcription with CTranslate2

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • whisper-diarization

    Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

  • The project page mentions whisper-diarization (speaker recognition) as a user of faster-whisper. I've been in the market for that, definitely going to try it out.

    https://github.com/MahmoudAshraf97/whisper-diarization

  • CTranslate2

    Fast inference engine for Transformer models

  • The original Whisper implementation from OpenAI uses the PyTorch deep learning framework. On the other hand, faster-whisper is implemented using CTranslate2 [1] which is a custom inference engine for Transformer models. So basically it is running the same model but using another backend, which is specifically optimized for inference workloads.

    [1] https://github.com/OpenNMT/CTranslate2

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Now I Can Just Print That Video

    5 projects | news.ycombinator.com | 4 Dec 2023
  • Whisper Turbo: transcribe 20x faster than realtime using Rust and WebGPU

    3 projects | news.ycombinator.com | 12 Sep 2023
  • LeMUR: LLMs for Audio and Speech

    1 project | news.ycombinator.com | 27 Jul 2023
  • Faster Whisper Transcription with CTranslate2

    1 project | /r/hypeurls | 24 Jul 2023
  • OpenAI Whisper Audio Transcription Benchmarked on 18 GPUs: Up to 3,000 WPM | Tom's Hardware

    1 project | /r/hardware | 11 May 2023