Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Interesting. So some random questions - how easy is it to make a new voice? What about a new voice in a new language? - ever looked at SAPI? Is it possible to make a SAPI bridge for this on windows? - how does it fit with other systems. Like coqui and RHvoice? https://github.com/RHVoice/RHVoice
I didnt not release trainy parts to build voices. I am considering, but there is so many packages already (coqui, espnet, piper, nemo, fairseq to name a few) that i focused on usability for now. Support for new languages is a different question. Everyone wants to train fancy neural nets. But support for new language is about writing rules and having language expertise. I did it for English (https://github.com/balacoon/en_us_normalization/tree/c1019cf878aa6baf25d6fff719cf418cca5a3107/production/classify). Doing it for all the other languages would probably take me a lifetime. Other speech synthesis solutions use 17-years old espeak for this purpose (https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md). I introduced the fallback to it in Balacoon too. But generally, it is outdated technology and I believe we should do better.
I didnt not release trainy parts to build voices. I am considering, but there is so many packages already (coqui, espnet, piper, nemo, fairseq to name a few) that i focused on usability for now. Support for new languages is a different question. Everyone wants to train fancy neural nets. But support for new language is about writing rules and having language expertise. I did it for English (https://github.com/balacoon/en_us_normalization/tree/c1019cf878aa6baf25d6fff719cf418cca5a3107/production/classify). Doing it for all the other languages would probably take me a lifetime. Other speech synthesis solutions use 17-years old espeak for this purpose (https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md). I introduced the fallback to it in Balacoon too. But generally, it is outdated technology and I believe we should do better.
My own favorites other than Balacoon would be Piper (https://github.com/rhasspy/piper) and espnet (https://espnet.github.io/espnet/notebook/espnet2_tts_realtime_demo.html).