Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
-
flowtron
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
-
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
-
Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
-
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
-
STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021 (by keonlee9420)
-
DiffSinger
PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech) (by keonlee9420)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Tortoise TTS is supposed to be good. However inference can take a while if not on GPU's, so might not produce the real-time text-to-speech effect you want.
I don't know if it's the best, but Speechbrain is supposed to be state of the art.
Tacotron submitted: Mar 29, 2017 paper: https://arxiv.org/pdf/1703.10135.pdf github: https://github.com/keithito/tacotron (Not the official implementation but is the once cited the most)
FastSpeech2 submitted: Jun 8, 2020 paper: https://arxiv.org/pdf/2006.04558.pdf github: https://github.com/ming024/FastSpeech2 (Not the official implementation but is the once cited the most)
RadTTS submitted: Aug 18, 2021 (NVIDIA page, not Arxiv) paper: https://openreview.net/pdf?id=0NQwnnwAORi github: https://github.com/NVIDIA/radtts
DiffTTS (DiffSinger) submitted: Apr 3, 2021 paper: https://arxiv.org/pdf/2104.01409v1.pdf github: https://github.com/keonlee9420/DiffSinger