Silero V3: fast high-quality text-to-speech in 20 languages with 173 voices

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

silero-models

32 4,546 4.7 Jupyter Notebook

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Can you elaborate further? I am not familiar with the field, but their benchmarks here seem to show quality similar to Google: https://github.com/snakers4/silero-models/wiki/Quality-Bench...
The only trick I can see being played is that Google was benchmarked on September 2020, so likely has already improved and they don't want to show that. Is CommonVoice a better standard to use when comparing these tools?

Voice-Cloning-App

18 1,247 0.0 Python

Discontinued A Python/Pytorch app for easily synthesising human voices

Nice to see this here - Silero is also the engine that powers the "dataset builder" for Voice-Cloning-App (https://github.com/BenAAndrew/Voice-Cloning-App), a GUI TTS system that modifies Tacotron2 slightly.
Just sharing the links in case others are new to the space and keen to tinker on some solid open-source offerings.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ttsprech

1 6 1.5 Python

Simple text2speech for the command line

Source is pretty much "embarrassingly simple":
   https://github.com/Grumbel/silero-test/blob/master/silero-test

razdel

1 243 2.1 Python

Rule-based token, sentence segmentation for Russian language

Also currently we abandoned batching, so GPUs are not really required at all.
> the quality (as in: what I'm hearing, not a formally measured metric) is good but (YMMV) not as good as turtle.
I believe the compute required during training and inference … may differ by 3 or 4 orders of magnitude (!).
Also note, that some speakers and languages just sound better due to high quality of source material and the amount of work invested and polish.
> it breaks with strange error messages if the text you feed it is too long
Well, there should be a warning somewhere, but it works with text no longer than 512-1024 symbols.
> there is mention of "a model for text repunctuation and recapitalization", which I wonder if it could be used to break a very long text (eg a book) into pieces that can be digested by the tts engine
This model only restores some punctuation marks and capital letters.
There are libraries like razdel for this - https://github.com/natasha/razdel

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project