Python Speech

Open-source Python projects categorized as Speech

Top 23 Python Speech Projects

  1. TTS

    🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    Project mention: Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds | news.ycombinator.com | 2024-11-27

    It's really easy for a technical person to do as well.

    I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.

    Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.

    I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".

    Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can

    [0] https://github.com/coqui-ai/TTS

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. MockingBird

    🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

  4. datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: 20 Open Source Tools I Recommend to Build, Share, and Run AI Projects | dev.to | 2024-11-13

    Datasets library repository for accessing and sharing datasets with the community.

  5. whisperX

    WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

    Project mention: Ask HN: Is Whisper Still Relevant? | news.ycombinator.com | 2025-02-12

    Yes it's still relevant but I prefer WhisperX for some tasks: https://github.com/m-bain/whisperX

  6. AudioGPT

    AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

  7. EmotiVoice

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

  8. modelscope

    ModelScope: bring the notion of Model-as-a-Service to life.

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. silero-vad

    Silero VAD: pre-trained enterprise-grade Voice Activity Detector

    Project mention: AI Voice Agents: Opensource, Pre-Trained Voice Activity Detector | news.ycombinator.com | 2024-07-28
  11. metavoice-src

    Foundational model for human-like, expressive TTS

    Project mention: Ask HN: What is the state of OSS voice cloning? | news.ycombinator.com | 2024-09-30
  12. DeepFilterNet

    Noise supression using deep filtering

    Project mention: Real-time ML audio noise suppression on Raspberry Pi Pico 2 | news.ycombinator.com | 2024-08-09

    Very cool! Would be curious to see how this compares to https://github.com/Rikorose/DeepFilterNet written in Rust.

    Or this Samsung Research paper https://research.samsung.com/blog/FSPEN-AN-ULTRA-LIGHTWEIGHT...

  13. lingvo

    Lingvo

  14. aeneas

    aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

  15. whisper-asr-webservice

    OpenAI Whisper ASR Webservice API

  16. gTTS

    Python library and CLI tool to interface with Google Translate's text-to-speech API

    Project mention: Using Groq to Build a Real-Time Language Translation App | dev.to | 2024-04-05

    For our real-time TTS needs, we'll employ the fantastic library called gTTS.

  17. whisper-timestamped

    Multilingual Automatic Speech Recognition with word-level timestamps and confidence

  18. IMS-Toucan

    Controllable and fast Text-to-Speech for over 7000 languages!

    Project mention: Toucan TTS: MIT licensed Text to Speech in 7000 languages | news.ycombinator.com | 2024-06-20
  19. SALMONN

    SALMONN: Speech Audio Language Music Open Neural Network

  20. dc_tts

    A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

  21. voicefixer

    General Speech Restoration

  22. StreamSpeech

    StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

    Project mention: Ask HN: Real-time speech-to-speech translation | news.ycombinator.com | 2024-10-24

    Has anyone had any luck with an offline, free, open-source real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?

    * https://github.com/ictnlp/StreamSpeech

    * https://github.com/k2-fsa/sherpa-onnx

    * https://github.com/openai/whisper

    I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn't stick in the ear. Although real-time would be great, a 3- to 5-second delay is manageable.

    RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.

    Any suggestions?

  23. pykaldi

    A Python wrapper for Kaldi

  24. lhotse

    Tools for handling speech data in machine learning projects.

  25. NATSpeech

    A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Speech discussion

Log in or Post with

Python Speech related posts

  • Show HN: Mikey – No bot meeting notetaker for Windows

    6 projects | news.ycombinator.com | 12 Feb 2025
  • Ask HN: Is Whisper Still Relevant?

    2 projects | news.ycombinator.com | 12 Feb 2025
  • Show HN: Using YOLO to Detect Office Chairs in 40M Hotel Photos

    4 projects | news.ycombinator.com | 25 Jan 2025
  • Transcriber AI – Free, end-to-end machine based transcription with speaker id

    1 project | news.ycombinator.com | 16 Dec 2024
  • Supercharge Your AI Skills: 5 Open Source Repositories You Can't Afford to Miss

    5 projects | dev.to | 21 Nov 2024
  • Show HN: Offline audiobook from any format with one CLI command

    7 projects | news.ycombinator.com | 6 Oct 2024
  • WhisperX: Precise ASR with Word-Level Timestamps and Diarization

    1 project | news.ycombinator.com | 5 Sep 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 27 Mar 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Speech projects in Python? This list will help you:

# Project Stars
1 TTS 38,861
2 MockingBird 36,008
3 datasets 19,851
4 whisperX 14,571
5 AudioGPT 10,116
6 EmotiVoice 7,753
7 modelscope 7,598
8 silero-vad 5,314
9 metavoice-src 4,076
10 DeepFilterNet 2,930
11 lingvo 2,833
12 aeneas 2,609
13 whisper-asr-webservice 2,454
14 gTTS 2,423
15 whisper-timestamped 2,322
16 IMS-Toucan 1,564
17 SALMONN 1,191
18 dc_tts 1,159
19 voicefixer 1,113
20 StreamSpeech 1,041
21 pykaldi 1,010
22 lhotse 987
23 NATSpeech 971

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?