Our great sponsors
-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Can you elaborate further? I am not familiar with the field, but their benchmarks here seem to show quality similar to Google: https://github.com/snakers4/silero-models/wiki/Quality-Bench...
The only trick I can see being played is that Google was benchmarked on September 2020, so likely has already improved and they don't want to show that. Is CommonVoice a better standard to use when comparing these tools?
Nice to see this here - Silero is also the engine that powers the "dataset builder" for Voice-Cloning-App (https://github.com/BenAAndrew/Voice-Cloning-App), a GUI TTS system that modifies Tacotron2 slightly.
Just sharing the links in case others are new to the space and keen to tinker on some solid open-source offerings.
Source is pretty much "embarrassingly simple":
https://github.com/Grumbel/silero-test/blob/master/silero-test
Also currently we abandoned batching, so GPUs are not really required at all.
> the quality (as in: what I'm hearing, not a formally measured metric) is good but (YMMV) not as good as turtle.
I believe the compute required during training and inference … may differ by 3 or 4 orders of magnitude (!).
Also note, that some speakers and languages just sound better due to high quality of source material and the amount of work invested and polish.
> it breaks with strange error messages if the text you feed it is too long
Well, there should be a warning somewhere, but it works with text no longer than 512-1024 symbols.
> there is mention of "a model for text repunctuation and recapitalization", which I wonder if it could be used to break a very long text (eg a book) into pieces that can be digested by the tts engine
This model only restores some punctuation marks and capital letters.
There are libraries like razdel for this - https://github.com/natasha/razdel
Related posts
- AI-genereeritud Politseikroonika
- Making Voices For System Members
- [Discussion] Is there any open-source alternative to voice.ai ? Looking for open-source speech to speech AI
- Voice actor I need died a decade ago. Is there a program which can create text-to-voice with the voice of a specific person through providing the software voice samples to work from?
- Trying to get it working