OpenAI's Whisper: an open-sourced neural net "that approaches human level robustness and accuracy on English speech recognition." Can be used as a Python package or from the command line

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

whisper

344 60,303 6.4 Python

Robust Speech Recognition via Large-Scale Weak Supervision

Yeah it does seem that that's fast enough for real-time...but I don't know enough about the underpinnings of the model, like some phrases get almost instantaneously transcribed, and then there's big unexpected pauses (given that the sample audio has a consistent stream of words). I don't know if it has anything to do with Whisper being designed to do phrase-level tokenization (i.e. you can't get word-by-word timestamp data)

yt-dlp

2,360 70,636 9.8 Python

A feature-rich command-line audio/video downloader

I posted a command-line example here (it uses yt-dlp, aka youtube-dl to extract audio from an example online video:

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
sentence-transformers

45 13,793 9.2 Python

Multilingual Sentence & Image Embeddings with BERT

Did you use venv, or did you work around this another way?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project