OpenAI's Whisper: an open-sourced neural net "that approaches human level robustness and accuracy on English speech recognition." Can be used as a Python package or from the command line

This page summarizes the projects mentioned and recommended in the original post on /r/Python

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

  • Yeah it does seem that that's fast enough for real-time...but I don't know enough about the underpinnings of the model, like some phrases get almost instantaneously transcribed, and then there's big unexpected pauses (given that the sample audio has a consistent stream of words). I don't know if it has anything to do with Whisper being designed to do phrase-level tokenization (i.e. you can't get word-by-word timestamp data)

  • yt-dlp

    A feature-rich command-line audio/video downloader

  • I posted a command-line example here (it uses yt-dlp, aka youtube-dl to extract audio from an example online video:

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • sentence-transformers

    Multilingual Sentence & Image Embeddings with BERT

  • Did you use venv, or did you work around this another way?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts