Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python speech-synthesis Projects
-
Project mention: OpenAI deems its voice cloning tool too risky for general release | news.ycombinator.com | 2024-03-31
lol this marketing technique is getting very old. https://github.com/coqui-ai/TTS is already amazing and open source.
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
PaddlePaddle/PaddleSpeech
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Project mention: [P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN) | /r/MachineLearning | 2023-07-06
I don't test WaveRNN but from the ones that I know the best that is open source is FastPitch. And it's easy to use, here is the tutorial for voice cloning.
-
-
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17
You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
-
-
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
And the voice encapsulation system VITS https://github.com/jaywalnut310/vits
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
-
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
-
TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
Hey HN, has anyone found a viable solution for doing this locally and offline on iOS? I'd like to offer a privacy-friendly text to speech feature to my App, and Apple's speech synthesis sounds awful compared to some newer models and TTS engines. The only thing I've found is an older TensorflowTTS example here: https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/ios
Any pointers or tips appreciated.
-
edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Project mention: [discussion] text to voice generation for textbooks (non-math part) | /r/MachineLearning | 2023-12-01i would very much like to use it to turn the text parts of a book into an audio where i could listen to it while reading. i used edge's tts for speech by giving a paragraph to clipboard and to edge-tts in order to listen the text but it causes two problems: 1. you need internet connection and have the book opened 2. can only do paragraph by paragraph, and is prone to errors or sometimes if you use it too much it wont convert the full text afterwards.
-
tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
-
-
-
-
-
-
naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Project mention: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers | /r/singularity | 2023-04-19 -
Project mention: [HELP] Speech2Speech translator with speaker voice preservation | /r/learnmachinelearning | 2023-05-20
Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
-
-
NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
-
Project mention: Linux Audio Noise suppression using deep filtering in Rust | news.ycombinator.com | 2023-06-06
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python speech-synthesis related posts
- Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
- Microsoft releases Windows AI studio to run and fine tune models locally
- [D] What offline TTS Model is good enough for a realistic real-time task?
- [discussion] text to voice generation for textbooks (non-math part)
- StyleTTS2 – open-source Eleven Labs quality Text To Speech
- Ask HN: On-Device Text to Speech
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 Apr 2024
Index
What are some of the best open-source speech-synthesis projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | TTS | 28,959 |
2 | PaddleSpeech | 10,069 |
3 | NeMo | 9,951 |
4 | so-vits-svc-fork | 8,287 |
5 | espnet | 7,852 |
6 | EmotiVoice | 6,234 |
7 | vits | 6,206 |
8 | DiffSinger | 4,085 |
9 | Amphion | 3,864 |
10 | TensorFlowTTS | 3,690 |
11 | edge-tts | 3,503 |
12 | tacotron | 2,919 |
13 | lingvo | 2,781 |
14 | Tacotron-2 | 2,232 |
15 | WaveRNN | 2,086 |
16 | hifi-gan | 1,744 |
17 | kalliope | 1,694 |
18 | naturalspeech2-pytorch | 1,192 |
19 | SpeechT5 | 1,007 |
20 | autovc | 948 |
21 | NATSpeech | 944 |
22 | voicefixer | 896 |
23 | diffwave | 720 |