Show HN: Self-host Whisper As a Service with GUI and queueing

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  • By the way there is also another project called Whisper.cpp:

    https://github.com/ggerganov/whisper.cpp

    Which uses x8 less memory than the Python implementation for the tiny model. It would be a good idea to keep an eye on it since there are Python bindings planned on the roadmap:

    https://github.com/ggerganov/whisper.cpp#bindings

  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • I was working on this yesterday. It seems that the most common approach with Whisper is simply to break the audio into chunks and transcribe each one separately. This works but as you'd expect sometimes has trouble at the edges.

    You could do better by overlapping the segments, except then stitching the transcriptions together becomes an issue since whisper doesn't provide reliable per-token timestamps [0], and the output of the common part of overlapping segments isn't necessarily the same.

    Some more useful discussion here [1].

    0.https://github.com/openai/whisper/discussions/332

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • WAAS

    Whisper as a Service (GUI and API with queuing for OpenAI Whisper)

  • transcribe-anything

    Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

  • People interested in this might also be interested in transcribe-anything [1].

    It automates video fetching and uses whisper to generate .srt, .vtt and .txt files.

    [1] https://github.com/zackees/transcribe-anything

  • frogbase

    Discontinued Transform audio-visual content into navigable knowledge.

  • ai-notes

    notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

  • nlp

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

  • audioclerk

    Transcribe audio to text, watch folders to auto transcribe audio

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts