Python speech-recognition

Open-source Python projects categorized as speech-recognition

Top 23 Python speech-recognition Projects

speech-recognition
  1. transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

    Project mention: The $100 ChatGPT: Why Karpathy's nanochat Represnts the Next Big Thing | dev.to | 2026-05-04

    Hugging Face Transformers: 500,000+ lines

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server | dev.to | 2026-05-09

    Transcribes it locally using faster-whisper

  4. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: The Unit Economics of Speech-to-Text Just Collapsed | dev.to | 2026-04-18

    Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6× faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.

  5. FunASR

    Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

    Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15

    FunASR - Automatic Speech Recognition

  6. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  7. speechbrain

    A PyTorch-based Speech Toolkit

    Project mention: 5 must know open-source repositories to build cool AI apps | dev.to | 2025-10-29

    Star the Speech Brain repository ⭐

  8. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    Project mention: Show HN: Likes/day as fake profile → built my own dating app in 100 days | news.ycombinator.com | 2025-12-16
  9. espnet

    End-to-End Speech Processing Toolkit

  10. SpeechRecognition

    Speech recognition module for Python, supporting several engines and APIs, online and offline.

  11. SenseVoice

    Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.

  12. mlx-audio

    A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

    Project mention: The Free, Open-Source Alternative to ElevenLabs Is Finally Here | dev.to | 2026-05-24

    uv pip install "git+https://github.com/Blaizzy/mlx-audio" --prerelease=allow uv pip install soundfile

  13. wenet

    Production First and Production Ready End-to-End Speech Recognition Toolkit

    Project mention: CosyVoice 2025 Complete Guide: The Ultimate Multi-lingual Text-to-Speech Solution | dev.to | 2025-12-15

    WeNet - Speech Recognition Toolkit

  14. Porcupine  

    On-device wake word detection powered by deep learning

    Project mention: Porcupine – On-device wake word detection powered by deep learning | news.ycombinator.com | 2026-03-05
  15. ml-road

    Machine Learning and Agentic AI Resources, Practice and Research

    Project mention: Neural Networks: Zero to Hero | news.ycombinator.com | 2026-01-04

    Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.

    A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".

    https://github.com/yanshengjia/ml-road/blob/master/resources...

    Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or even a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".

  16. distil-whisper

    Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

  17. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

    Project mention: How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software | dev.to | 2026-02-28

    Imagine owning a massive Plex media library with hundreds of foreign-language films and TV shows. You want subtitles for everything, but manually sourcing them is a nightmare — mismatched timings, missing translations, incomplete coverage. Tools like Bazarr exist specifically to automate subtitle management for Plex and Sonarr/Radarr libraries, and they ship with built-in integration for whisper-asr-webservice — a self-hosted REST API that wraps OpenAI's Whisper speech recognition model. Point Bazarr at a whisper-asr-webservice endpoint, and it will automatically transcribe and generate subtitles for every piece of media in your library, in any language Whisper supports.

  18. whisper-standalone-win

    Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

    Project mention: Why removing 'um' from a recording is harder than it sounds | news.ycombinator.com | 2026-06-12

    Yeah, that's in faster-whisper-xxl via the --diarize parameter with additional options to tweak how it works:

    https://github.com/Purfview/whisper-standalone-win/discussio...

    I haven't used it when subtitling, though, so I don't know much more.

  19. lingvo

    Lingvo

  20. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Cohere Transcribe: Speech Recognition | news.ycombinator.com | 2026-03-31

    There is also: https://github.com/linto-ai/whisper-timestamped

    It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

  21. lip-reading-deeplearning

    :unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

  22. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  23. SALMONN

    SALMONN family: A suite of advanced multi-modal LLMs

  24. SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-recognition discussion

Log in or Post with

Python speech-recognition related posts

  • I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server

    2 projects | dev.to | 9 May 2026
  • Deep Dive: OpenAI Whisper 2.0 vs. Deepgram 2.0 for Code Transcription in 2026

    2 projects | dev.to | 28 Apr 2026
  • The Unit Economics of Speech-to-Text Just Collapsed

    2 projects | dev.to | 18 Apr 2026
  • How to Migrate from Deprecated VAPI Transcriber Endpoints to Deepgram v2 in Retell AI Agents

    1 project | dev.to | 15 Apr 2026
  • mlx-audio: Speech Processing Library on Apple Silicon

    1 project | dev.to | 18 Mar 2026
  • I Built a Voice Assistant That Runs Entirely in Your Browser

    9 projects | dev.to | 3 Mar 2026
  • How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software

    2 projects | dev.to | 28 Feb 2026
  • A note from our sponsor - SaaSHub
    www.saashub.com | 18 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source speech-recognition projects in Python? This list will help you:

# Project Stars
1 transformers 161,558
2 faster-whisper 23,557
3 whisperX 22,445
4 FunASR 17,920
5 PaddleSpeech 12,614
6 speechbrain 11,610
7 voice-pro 10,931
8 espnet 9,858
9 SpeechRecognition 8,970
10 SenseVoice 8,601
11 mlx-audio 7,345
12 wenet 5,138
13 Porcupine   4,851
14 ml-road 4,825
15 distil-whisper 4,065
16 whisper-asr-webservice 3,281
17 whisper-standalone-win 3,080
18 lingvo 2,864
19 whisper-timestamped 2,818
20 lip-reading-deeplearning 1,903
21 kalliope 1,753
22 SALMONN 1,449
23 SpeechT5 1,440

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com