emotivoice-cli
emotivoice-cli | LiteratureForEyesAndEars | |
---|---|---|
1 | 1 | |
5 | 0 | |
- | - | |
4.9 | 4.7 | |
3 days ago | 5 months ago | |
JavaScript | Python | |
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
emotivoice-cli
-
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
Interested to see how it performs for Mandarin Chinese speech synthesis, especially with prosody and emotion. The highest quality open source model I've seen so far is EmotiVoice[0], which I've made a CLI wrapper around to generate audio for flashcards.[1] For EmotiVoice, you can apparently also clone your own voice with a GPU, but I have not tested this.[2]
[0] https://github.com/netease-youdao/EmotiVoice
[1] https://github.com/siraben/emotivoice-cli
[2] https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Clon...
LiteratureForEyesAndEars
-
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
I have forced alignments, too.
E.g. for the True Story of Ah Q https://github.com/Yorwba/LiteratureForEyesAndEars/tree/mast... .align.json is my homegrown alignment format, .srt are standard subtitles, .txt is the text, but note that in some places I have [[original text||what it is pronounced as]] annotations to make the forced alignment work better. (E.g. the "." in LibriVox.org, pronounced as 點 "diǎn" in Mandarin.) Oh, and cmn-Hans is the same thing transliterated into Simplified Chinese.
The corresponding LibriVox URL is predictably https://librivox.org/the-true-story-of-ah-q-by-xun-lu/
What are some alternatives?
WhisperSpeech - An Open Source text-to-speech system built by inverting Whisper.