Base TTS (Amazon): The largest text-to-speech model to-date

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

bark

1 9 6.3 Python

An inference server for Bark (by SaladTechnologies)

Bark and Tortoise work fairly well. Bark does super fast inference[1] on my M1.
[1] https://github.com/SaladTechnologies/bark

metavoice-src

4 2,998 8.2 Python

Foundational model for human-like, expressive TTS

Interesting. Just a couple of hours ago I came across MetaVoice-1B [0] (Demo [1]) and was amazed by the quality of their TTS in English (sadly no other languages available).
If this year becomes the year when high quality Open Source TTS and ASR models appear that can run in real-time on an Nvidia RTX 40x0 or 30x0, then that would be great. On CPU even better.
[0] https://github.com/metavoiceio/metavoice-src
[1] https://ttsdemo.themetavoice.xyz/

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
TTS

231 29,174 9.5 Python

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

I've used coqui.ai's TTS models[0] and library[1] to great success. I was able to get cloned voice to be rendered in about 80% of the audio clip length, and I believe you can also stream the response. Do note the model license for XTTS, it is one they wrote themselves that has some restrictions.
[0] https://huggingface.co/coqui/XTTS-v2
[1] https://github.com/coqui-ai/TTS

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project