Python Speech

Open-source Python projects categorized as Speech

Top 23 Python Speech Projects

  1. TTS

    🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    Project mention: My Journey to a reliable and enjoyable locally hosted voice assistant | news.ycombinator.com | 2026-03-16

    actually the hardest part of a locally hosted voice assistant isn't the llm. it's making the tts tolerable to actually talk to every day.

    the core issue is prosody: kokoro and piper are trained on read speech, but conversational responses have shorter breath groups and different stress patterns on function words. that's why numbers, addresses, and hedged phrases sound off even when everything else works.

    the fix is training data composition. conversational and read speech have different prosody distributions and models don't generalize across them. for self-hosted, coqui xtts-v2 [1] is worth trying if you want more natural english output than kokoro.

    btw i'm lily, cofounder of rime [2]. we're solving this for business voice agents at scale, not really the personal home assistant use case, but the underlying problem is the same.

    [1] https://github.com/coqui-ai/TTS

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. MockingBird

    🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

  4. VoxCPM

    VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

    Project mention: Rust RAG, Tokenizer-Free TTS (VoxCPM2), & Project NOMAD: Local AI & Offline Deployments | dev.to | 2026-05-30

    Source: https://github.com/OpenBMB/VoxCPM

  5. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: The Unit Economics of Speech-to-Text Just Collapsed | dev.to | 2026-04-18

    Look at what arrived between mid-2023 and mid-2025. Gandhi et al.'s Distil-Whisper (2023) distilled large-v2 into a 756M-param student that runs 6× faster with a 1% WER gap on out-of-distribution audio, using large-scale pseudo-labelling. Georgi Gerganov's whisper.cpp made CPU-only and mobile inference a default rather than a party trick; a base.en checkpoint transcribes real-time on an M1 without touching a GPU. Max Bain's WhisperX added forced-alignment and diarization on top, so word-level timestamps and speaker labels stopped being a premium-tier differentiator.

  6. datasets

    🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

    Project mention: GSoC 2026 Predictions: 30 NEW AI/ML/Security Organizations You Should Start Contributing to NOW! | dev.to | 2026-02-06
  7. AudioGPT

    AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

  8. silero-vad

    Silero VAD: pre-trained enterprise-grade Voice Activity Detector

    Project mention: Show HN: Apple's Sharp Running in the Browser via ONNX Runtime Web | news.ycombinator.com | 2026-05-03

    So I use a VAD onnx (Silero [1]) to automatically detect when someone is talking, and then it sends the audio into one of the voice recognition libraries.

    I originally tried to get away with just Whisper Tiny in the chess game [2], but it performs worse on the kinds of short phrases (knight E4, c takes d5, etc) used to dictate chess notation. Even with hotword-based phrasing and corrections, I found its accuracy on brief inputs noticeably poorer. So I switched over to Sherpa [3] trained on gigaspeech. It’s significantly more accurate, but it also comes with a correspondingly larger memory footprint.

    Ideally, I would have used just one engine, but I needed a fallback for iOS devices (especially older ones) which can easily OOM.

    [1] - https://github.com/snakers4/silero-vad

    [2] - https://shahkur.specr.net

    [3] - https://github.com/k2-fsa/sherpa-onnx

  9. modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

  10. EmotiVoice

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

  11. speech-to-speech

    Build local voice agents with open-source models

  12. ultravox

    A fast multimodal LLM for real-time voice

  13. metavoice-src

    Foundational model for human-like, expressive TTS

  14. DeepFilterNet

    Noise supression using deep filtering

    Project mention: Show HN: Background noise removal in multimedia with a single command | news.ycombinator.com | 2025-10-06
  15. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

    Project mention: How GPU-Powered Coding Agents Can Assist in Development of GPU-Accelerated Software | dev.to | 2026-02-28

    Imagine owning a massive Plex media library with hundreds of foreign-language films and TV shows. You want subtitles for everything, but manually sourcing them is a nightmare — mismatched timings, missing translations, incomplete coverage. Tools like Bazarr exist specifically to automate subtitle management for Plex and Sonarr/Radarr libraries, and they ship with built-in integration for whisper-asr-webservice — a self-hosted REST API that wraps OpenAI's Whisper speech recognition model. Point Bazarr at a whisper-asr-webservice endpoint, and it will automatically transcribe and generate subtitles for every piece of media in your library, in any language Whisper supports.

  16. lingvo

    Lingvo

  17. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

    Project mention: Cohere Transcribe: Speech Recognition | news.ycombinator.com | 2026-03-31

    There is also: https://github.com/linto-ai/whisper-timestamped

    It doesn't use an extra model (so it supports every language that works with Whisper out of the box and use less memory), it works by applying Dynamic Time Warping to cross-attention weights.

  18. aeneas

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  19. gTTS

    Python library and CLI tool to interface with Google Translate's text-to-speech API

  20. IMS-Toucan

    Controllable and fast Text-to-Speech for over 7000 languages!

    Project mention: IMS Toucan – Text-to-Speech for over 7000 Languages | news.ycombinator.com | 2026-01-02
  21. openai-edge-tts

    Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs

  22. SALMONN

    SALMONN family: A suite of advanced multi-modal LLMs

  23. voicefixer

    General Speech Restoration

  24. StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Speech discussion

Log in or Post with

Python Speech related posts

  • A beginner's guide to the Whisperx-A40-Large model by Victor-Upmeet on Replicate

    1 project | dev.to | 4 Jan 2026
  • AI Twin — Voice Cloning with Text-to-Speech

    2 projects | dev.to | 16 Dec 2025
  • Making AI Models Faster, Cheaper, and Greener — Here’s How

    5 projects | dev.to | 3 Nov 2025
  • 2025 Voice AI Guide: How to Make Your Own Real-Time Voice Agent (Part-1)

    7 projects | dev.to | 20 Sep 2025
  • Ask HN: What Speaker Diarization tools should I look into?

    1 project | news.ycombinator.com | 23 Jul 2025
  • Training with Big Data on Any Cloud

    4 projects | dev.to | 20 Jun 2025
  • Show HN: Mikey – No bot meeting notetaker for Windows

    6 projects | news.ycombinator.com | 12 Feb 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 14 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Speech projects in Python? This list will help you:

# Project Stars
1 TTS 45,294
2 MockingBird 36,904
3 VoxCPM 28,778
4 whisperX 22,445
5 datasets 21,620
6 AudioGPT 10,205
7 silero-vad 9,308
8 modelscope 8,963
9 EmotiVoice 8,479
10 speech-to-speech 4,873
11 ultravox 4,449
12 metavoice-src 4,194
13 DeepFilterNet 4,056
14 whisper-asr-webservice 3,281
15 lingvo 2,864
16 whisper-timestamped 2,818
17 aeneas 2,811
18 gTTS 2,620
19 IMS-Toucan 2,203
20 openai-edge-tts 1,928
21 SALMONN 1,449
22 voicefixer 1,327
23 StreamSpeech 1,270

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?