Our great sponsors
-
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
voice100
Voice100 includes neural TTS/ASR models. Inference of Voice100 is low cost as its models are tiny and only depend on CNN without autoregression.
As for synthesis of text using your own voice - you can dig into Real Time Voice Cloning or maybe FastSpeech2, but I am not sure if you can use it with conlangs (and because of ML nature, you need many, many, many training data to get anything interesting).
As for synthesis of text using your own voice - you can dig into Real Time Voice Cloning or maybe FastSpeech2, but I am not sure if you can use it with conlangs (and because of ML nature, you need many, many, many training data to get anything interesting).
If you would like to use your romanization, yes, first you have to have some way to perform grapheme-to-phoneme transcription. I dug for a bit and found something that looks pretty basic, where you can easily write your own phonemizer: https://github.com/kaiidams/voice100. Not sure how good this model is, as it's made to be working on small devices, but you may play with it.