Balacoon: python package for text-to-speech

This page summarizes the projects mentioned and recommended in the original post on /r/Python

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • RHVoice

    a free and open source speech synthesizer for Russian and other languages

  • Interesting. So some random questions - how easy is it to make a new voice? What about a new voice in a new language? - ever looked at SAPI? Is it possible to make a SAPI bridge for this on windows? - how does it fit with other systems. Like coqui and RHvoice? https://github.com/RHVoice/RHVoice

  • en_us_normalization

    Grammars for en_us text normalization

  • I didnt not release trainy parts to build voices. I am considering, but there is so many packages already (coqui, espnet, piper, nemo, fairseq to name a few) that i focused on usability for now. Support for new languages is a different question. Everyone wants to train fancy neural nets. But support for new language is about writing rules and having language expertise. I did it for English (https://github.com/balacoon/en_us_normalization/tree/c1019cf878aa6baf25d6fff719cf418cca5a3107/production/classify). Doing it for all the other languages would probably take me a lifetime. Other speech synthesis solutions use 17-years old espeak for this purpose (https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md). I introduced the fallback to it in Balacoon too. But generally, it is outdated technology and I believe we should do better.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • espeak-ng

    eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

  • I didnt not release trainy parts to build voices. I am considering, but there is so many packages already (coqui, espnet, piper, nemo, fairseq to name a few) that i focused on usability for now. Support for new languages is a different question. Everyone wants to train fancy neural nets. But support for new language is about writing rules and having language expertise. I did it for English (https://github.com/balacoon/en_us_normalization/tree/c1019cf878aa6baf25d6fff719cf418cca5a3107/production/classify). Doing it for all the other languages would probably take me a lifetime. Other speech synthesis solutions use 17-years old espeak for this purpose (https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.md). I introduced the fallback to it in Balacoon too. But generally, it is outdated technology and I believe we should do better.

  • piper

    A fast, local neural text to speech system (by rhasspy)

  • My own favorites other than Balacoon would be Piper (https://github.com/rhasspy/piper) and espnet (https://espnet.github.io/espnet/notebook/espnet2_tts_realtime_demo.html).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts