Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Bark and Tortoise work fairly well. Bark does super fast inference[1] on my M1.
[1] https://github.com/SaladTechnologies/bark
Interesting. Just a couple of hours ago I came across MetaVoice-1B [0] (Demo [1]) and was amazed by the quality of their TTS in English (sadly no other languages available).
If this year becomes the year when high quality Open Source TTS and ASR models appear that can run in real-time on an Nvidia RTX 40x0 or 30x0, then that would be great. On CPU even better.
[0] https://github.com/metavoiceio/metavoice-src
[1] https://ttsdemo.themetavoice.xyz/
I've used coqui.ai's TTS models[0] and library[1] to great success. I was able to get cloned voice to be rendered in about 80% of the audio clip length, and I believe you can also stream the response. Do note the model license for XTTS, it is one they wrote themselves that has some restrictions.
[0] https://huggingface.co/coqui/XTTS-v2
[1] https://github.com/coqui-ai/TTS