Python speech-synthesis

Open-source Python projects categorized as speech-synthesis

Top 23 Python speech-synthesis Projects

speech-synthesis
  1. TTS

    πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

    Project mention: Real-time Voice Chat at ~500ms Latency | news.ycombinator.com | 2025-05-05

    That is probably the reason you can't find that much.

    *https://coqui.ai/

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

    Project mention: Speaker Diarization in Python | dev.to | 2024-08-22

    NVIDIA NeMo To perform speaker diarization using NVIDIA NeMo , follow these steps:

  4. PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

  5. espnet

    End-to-End Speech Processing Toolkit

  6. Amphion

    Amphion (/Γ¦mˈfaΙͺΙ™n/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

    Project mention: AIM Weekly for 04Nov2024 | dev.to | 2024-11-04

    🌐 Composed Image Retrieval πŸ“Ž Intro to Multimodal LLama 3.2 πŸ› οΈ Multi Agent Concierge πŸ’» RAG with Langchain Granite, Milvus 🫢 Download content βœ… Transformer Replacement? πŸ€– vLLM for runing models 🌐 Amphion πŸ“ Autogluon πŸš™ Notebook LLama like Google's Notebook LLM 🫢 Monocle2ai for tracing GenAI app code LFA&D Project πŸ€– Bee Agent Framework βœ… LLama RFP Response ▢️ GenAI Script πŸ‘½ Simular AI Agent S 🦾 DrawDB with AI ✨ Ollama with LLama 3.2 Vision!!!! Preview πŸš• Powerful RAG Checker πŸ“Š SQL Generator πŸ’» Role of LLMs 🐍 Document Extraction πŸ•ΆοΈ Open Source Vector DB Reddit πŸ” The Practical Guide to Self Hosting LLM 🦾 Stagehand Controller πŸ•ΆοΈ Understanding HNSWLIB 🐍 Best practices in RAG πŸ’» Enigma Agent πŸ“ Langchain, Ollama, Phi3 for Function Calling πŸ”‹ Compass Judger πŸ“ Princeton NLP SimPO πŸ” Princeton NLP ProLong πŸ”‹ Princeton NLP HELMET 🧐 Ollama Cheatsheet πŸš• Princeton NLP CopyCat πŸ“Š Princeton NLP Shp πŸ•ΆοΈ Can LLM Solve Hard Github Issues πŸ“ Enabling Large Language Models to Generate Text with Citations πŸ”‹ Princeton NLP CharXiv πŸ“Š Awesome AI Agents List 🦾 Nomic’s Matryoshka text embedding model

  7. so-vits-svc-fork

    so-vits-svc fork with realtime support, improved interface and more features.

  8. edge-tts

    Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

    Project mention: Show HN: Voice Cloning and Multilingual TTS in One Click (Windows) | news.ycombinator.com | 2025-01-26

    There is a MIT license in the repo. In that sense it's open source.

    It's using "Edge TTS", which I believe means use API keys stolen [1] from Microsoft Edge and hope Microsoft doesn't sue you, non jolly-roger flying internet users beware.

    Can't speak to other models and their licenses, I stopped looking after I saw this since I don't feel the need to use this.

    [1] https://github.com/rany2/edge-tts/blob/ac41fb85ab2b2b48fef8a...

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. EmotiVoice

    EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

  11. vits

    VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

  12. StyleTTS2

    StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

  13. DiffSinger

    DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

  14. metavoice-src

    Foundational model for human-like, expressive TTS

    Project mention: Ask HN: What is the state of OSS voice cloning? | news.ycombinator.com | 2024-09-30
  15. TensorFlowTTS

    :stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

  16. voice-pro

    Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

    Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool πŸ”Š | dev.to | 2025-02-10

    GitHub: https://github.com/abus-aikorea/voice-pro

  17. RealtimeTTS

    Converts text to speech in realtime

  18. tacotron

    A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

  19. lingvo

    Lingvo

  20. Tacotron-2

    DeepMind's Tacotron-2 Tensorflow implementation

  21. WaveRNN

    WaveRNN Vocoder + TTS

  22. hifi-gan

    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

  23. kalliope

    Kalliope is a framework that will help you to create your own personal assistant.

  24. IMS-Toucan

    Controllable and fast Text-to-Speech for over 7000 languages!

    Project mention: Toucan TTS: MIT licensed Text to Speech in 7000 languages | news.ycombinator.com | 2024-06-20
  25. SpeechT5

    Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python speech-synthesis discussion

Log in or Post with

Python speech-synthesis related posts

  • Show HN: Voice Cloning and Multilingual TTS in One Click (Windows)

    2 projects | news.ycombinator.com | 26 Jan 2025
  • Edge TTS

    4 projects | news.ycombinator.com | 22 Jan 2025
  • Show HN: Voice-Pro – AI Voice Cloning Magic: Transform Any Voice in 15 Seconds

    10 projects | news.ycombinator.com | 27 Nov 2024
  • Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model

    5 projects | news.ycombinator.com | 14 Oct 2024
  • Show HN: Offline audiobook from any format with one CLI command

    7 projects | news.ycombinator.com | 6 Oct 2024
  • Ask HN: What is the state of OSS voice cloning?

    6 projects | news.ycombinator.com | 30 Sep 2024
  • Show HN: Anycast+ – An AI-powered podcast app

    2 projects | news.ycombinator.com | 12 Aug 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 24 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more β†’

Index

What are some of the best open-source speech-synthesis projects in Python? This list will help you:

# Project Stars
1 TTS 40,169
2 NeMo 14,446
3 PaddleSpeech 11,899
4 espnet 9,120
5 Amphion 9,069
6 so-vits-svc-fork 9,007
7 edge-tts 8,281
8 EmotiVoice 7,987
9 vits 7,180
10 StyleTTS2 5,732
11 DiffSinger 4,491
12 metavoice-src 4,121
13 TensorFlowTTS 3,925
14 voice-pro 3,672
15 RealtimeTTS 3,064
16 tacotron 2,972
17 lingvo 2,838
18 Tacotron-2 2,309
19 WaveRNN 2,154
20 hifi-gan 2,137
21 kalliope 1,728
22 IMS-Toucan 1,594
23 SpeechT5 1,336

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?