Python speech-to-text

Open-source Python projects categorized as speech-to-text

Top 23 Python speech-to-text Projects

speech-to-text
  1. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server | dev.to | 2026-05-09

    Transcribes it locally using faster-whisper

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: The Unit Economics of Speech-to-Text Just Collapsed | dev.to | 2026-04-18

    Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6Γ— faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.

  4. pyvideotrans

    Translate the video from one language to another and embed dubbing & subtitles.

  5. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: 5 must know open-source repositories to build cool AI apps | dev.to | 2025-10-29

    Star the Speech Brain repository ⭐

  6. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    Project mention: Show HN: Likes/day as fake profile β†’ built my own dating app in 100 days | news.ycombinator.com | 2025-12-16
  7. RealtimeSTT

    A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

  8. SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  9. SenseVoice

    Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.

  10. mlx-audio

    A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

    Project mention: The Free, Open-Source Alternative to ElevenLabs Is Finally Here | dev.to | 2026-05-24

    uv pip install "git+https://github.com/Blaizzy/mlx-audio" --prerelease=allow uv pip install soundfile

  11. speech-to-speech

    Build local voice agents with open-source models

  12. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

    Project mention: How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software | dev.to | 2026-02-28

    Imagine owning a massive Plex media library with hundreds of foreign-language films and TV shows. You want subtitles for everything, but manually sourcing them is a nightmare β€” mismatched timings, missing translations, incomplete coverage. Tools like Bazarr exist specifically to automate subtitle management for Plex and Sonarr/Radarr libraries, and they ship with built-in integration for whisper-asr-webservice β€” a self-hosted REST API that wraps OpenAI's Whisper speech recognition model. Point Bazarr at a whisper-asr-webservice endpoint, and it will automatically transcribe and generate subtitles for every piece of media in your library, in any language Whisper supports.

  13. LLaMA-Omni

    LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

  14. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Why removing 'um' from a recording is harder than it sounds | news.ycombinator.com | 2026-06-12

    Yeah, that's in faster-whisper-xxl via the --diarize parameter with additional options to tweak how it works:

    https://github.com/Purfview/whisper-standalone-win/discussio...

    I haven't used it when subtitling, though, so I don't know much more.

  15. lingvo

    Lingvo

  16. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Cohere Transcribe: Speech Recognition | news.ycombinator.com | 2026-03-31

    There is also: https://github.com/linto-ai/whisper-timestamped

    It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

  17. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  18. Dragonfire

    the open-source virtual assistant for Ubuntu based Linux distributions

  19. airunner

    Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

  20. whisper-ctranslate2

    Whisper command line client compatible with original OpenAI client based on CTranslate2.

  21. StreamSpeech

    StreamSpeech is an β€œAll in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

  22. quillman

    A voice chat app

  23. dc_tts

    A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  24. AI-Waifu-Vtuber

    AI Vtuber for Streaming on Youtube/Twitch

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-to-text discussion

Log in or Post with

Python speech-to-text related posts

  • Why removing 'um' from a recording is harder than it sounds

    1 project | news.ycombinator.com | 12 Jun 2026
  • I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server

    2 projects | dev.to | 9 May 2026
  • Stop paying for AI transcription! πŸŽ™οΈ WritHer: 100% Local Voice Assistant for Windows. Privacy-first, Whisper + Ollama powered. Open Source on GitHub!

    2 projects | dev.to | 1 May 2026
  • The Unit Economics of Speech-to-Text Just Collapsed

    2 projects | dev.to | 18 Apr 2026
  • Show HN: I built a sub-500ms latency voice agent from scratch

    7 projects | news.ycombinator.com | 2 Mar 2026
  • I Built a Voice Assistant That Runs Entirely in Your Browser

    9 projects | dev.to | 3 Mar 2026
  • I Made Tkinter Look Like a Modern Glassmorphic App β€” Here's the Dark Magic I Used

    1 project | dev.to | 26 Feb 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 13 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source speech-to-text projects in Python? This list will help you:

# Project Stars
1 faster-whisper 23,557
2 whisperX 22,445
3 pyvideotrans 17,918
4 speechbrain 11,610
5 voice-pro 10,931
6 RealtimeSTT 9,874
7 SpeechRecognition 8,970
8 SenseVoice 8,522
9 mlx-audio 7,345
10 speech-to-speech 4,873
11 whisper-asr-webservice 3,281
12 LLaMA-Omni 3,141
13 whisper-standalone-win 3,055
14 lingvo 2,864
15 whisper-timestamped 2,818
16 kalliope 1,753
17 Dragonfire 1,409
18 airunner 1,315
19 whisper-ctranslate2 1,314
20 StreamSpeech 1,270
21 quillman 1,205
22 dc_tts 1,158
23 AI-Waifu-Vtuber 1,062

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?