InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more β
Top 23 Python speech-synthesis Projects
-
That is probably the reason you can't find that much.
*https://coqui.ai/
-
InfluxDB
InfluxDB β Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
NVIDIA NeMo To perform speaker diarization using NVIDIA NeMo , follow these steps:
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
-
Amphion
Amphion (/Γ¦mΛfaΙͺΙn/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
π Composed Image Retrieval π Intro to Multimodal LLama 3.2 π οΈ Multi Agent Concierge π» RAG with Langchain Granite, Milvus π«Ά Download content β Transformer Replacement? π€ vLLM for runing models π Amphion π Autogluon π Notebook LLama like Google's Notebook LLM π«Ά Monocle2ai for tracing GenAI app code LFA&D Project π€ Bee Agent Framework β LLama RFP Response βΆοΈ GenAI Script π½ Simular AI Agent S π¦Ύ DrawDB with AI β¨ Ollama with LLama 3.2 Vision!!!! Preview π Powerful RAG Checker π SQL Generator π» Role of LLMs π Document Extraction πΆοΈ Open Source Vector DB Reddit π The Practical Guide to Self Hosting LLM π¦Ύ Stagehand Controller πΆοΈ Understanding HNSWLIB π Best practices in RAG π» Enigma Agent π Langchain, Ollama, Phi3 for Function Calling π Compass Judger π Princeton NLP SimPO π Princeton NLP ProLong π Princeton NLP HELMET π§ Ollama Cheatsheet π Princeton NLP CopyCat π Princeton NLP Shp πΆοΈ Can LLM Solve Hard Github Issues π Enabling Large Language Models to Generate Text with Citations π Princeton NLP CharXiv π Awesome AI Agents List π¦Ύ Nomicβs Matryoshka text embedding model
-
-
edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Project mention: Show HN: Voice Cloning and Multilingual TTS in One Click (Windows) | news.ycombinator.com | 2025-01-26There is a MIT license in the repo. In that sense it's open source.
It's using "Edge TTS", which I believe means use API keys stolen [1] from Microsoft Edge and hope Microsoft doesn't sue you, non jolly-roger flying internet users beware.
Can't speak to other models and their licenses, I stopped looking after I saw this since I don't feel the need to use this.
[1] https://github.com/rany2/edge-tts/blob/ac41fb85ab2b2b48fef8a...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
-
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
-
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
-
Project mention: Ask HN: What is the state of OSS voice cloning? | news.ycombinator.com | 2024-09-30
-
TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
-
voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Project mention: Voice-Pro: Ultimate AI Voice Conversion and Multilingual Translation Tool π | dev.to | 2025-02-10GitHub: https://github.com/abus-aikorea/voice-pro
-
-
tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
-
-
-
-
-
-
Project mention: Toucan TTS: MIT licensed Text to Speech in 7000 languages | news.ycombinator.com | 2024-06-20
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-synthesis discussion
Python speech-synthesis related posts
-
Show HN: Voice Cloning and Multilingual TTS in One Click (Windows)
-
Edge TTS
-
Show HN: Voice-Pro β AI Voice Cloning Magic: Transform Any Voice in 15 Seconds
-
Play 3.0 mini β A lightweight, reliable, cost-efficient Multilingual TTS model
-
Show HN: Offline audiobook from any format with one CLI command
-
Ask HN: What is the state of OSS voice cloning?
-
Show HN: Anycast+ β An AI-powered podcast app
-
A note from our sponsor - InfluxDB
www.influxdata.com | 24 May 2025
Index
What are some of the best open-source speech-synthesis projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | TTS | 40,169 |
2 | NeMo | 14,446 |
3 | PaddleSpeech | 11,899 |
4 | espnet | 9,120 |
5 | Amphion | 9,069 |
6 | so-vits-svc-fork | 9,007 |
7 | edge-tts | 8,281 |
8 | EmotiVoice | 7,987 |
9 | vits | 7,180 |
10 | StyleTTS2 | 5,732 |
11 | DiffSinger | 4,491 |
12 | metavoice-src | 4,121 |
13 | TensorFlowTTS | 3,925 |
14 | voice-pro | 3,672 |
15 | RealtimeTTS | 3,064 |
16 | tacotron | 2,972 |
17 | lingvo | 2,838 |
18 | Tacotron-2 | 2,309 |
19 | WaveRNN | 2,154 |
20 | hifi-gan | 2,137 |
21 | kalliope | 1,728 |
22 | IMS-Toucan | 1,594 |
23 | SpeechT5 | 1,336 |