Show HN: Self-host Whisper As a Service with GUI and queueing

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper.cpp

187 31,174 9.8 C

Port of OpenAI's Whisper model in C/C++

By the way there is also another project called Whisper.cpp:
https://github.com/ggerganov/whisper.cpp
Which uses x8 less memory than the Python implementation for the tiny model. It would be a good idea to keep an eye on it since there are Python bindings planned on the roadmap:
https://github.com/ggerganov/whisper.cpp#bindings

whisper

343 60,303 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

I was working on this yesterday. It seems that the most common approach with Whisper is simply to break the audio into chunks and transcribe each one separately. This works but as you'd expect sometimes has trouble at the edges.
You could do better by overlapping the segments, except then stitching the transcriptions together becomes an issue since whisper doesn't provide reliable per-token timestamps [0], and the output of the common part of overlapping segments isn't necessarily the same.
Some more useful discussion here [1].
0.https://github.com/openai/whisper/discussions/332

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
WAAS

12 1,732 7.3 JavaScript

Whisper as a Service (GUI and API with queuing for OpenAI Whisper)
transcribe-anything

11 351 9.3 Python

Input a local file or url and this service will transcribe it using Whisper AI. Completely private and Free 🤯🤯🤯

People interested in this might also be interested in transcribe-anything [1].
It automates video fetching and uses whisper to generate .srt, .vtt and .txt files.
[1] https://github.com/zackees/transcribe-anything

frogbase

14 754 4.3 Python

Discontinued Transform audio-visual content into navigable knowledge.
ai-notes

15 4,510 9.8 HTML

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
nlp

2 419 4.5 Jupyter Notebook
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
whisperX

24 8,965 8.4 Python

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
audioclerk

1 13 0.0 Go

Transcribe audio to text, watch folders to auto transcribe audio

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project