EmotiVoice
voice100
EmotiVoice | voice100 | |
---|---|---|
5 | 1 | |
6,369 | 25 | |
- | - | |
8.9 | 4.9 | |
3 months ago | 6 months ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
EmotiVoice
- FLaNK Stack Weekly 12 February 2024
-
WhisperSpeech โ An Open Source text-to-speech system built by inverting Whisper
Interested to see how it performs for Mandarin Chinese speech synthesis, especially with prosody and emotion. The highest quality open source model I've seen so far is EmotiVoice[0], which I've made a CLI wrapper around to generate audio for flashcards.[1] For EmotiVoice, you can apparently also clone your own voice with a GPU, but I have not tested this.[2]
[0] https://github.com/netease-youdao/EmotiVoice
[1] https://github.com/siraben/emotivoice-cli
[2] https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Clon...
-
Microsoft releases Windows AI studio to run and fine tune models locally
Interesting. I'll have to check to be sure, but I think maybe something is happening automagically if you have reasonably up to date nvidia drivers on the host OS, because I was able to run the EmotiVoice TTS docker (which requires nvidia gpu) from WSL2.
https://github.com/netease-youdao/EmotiVoice
- FLaNK Stack Weekly for 13 November 2023
- EmotiVoice: A Multi-Voice and Prompt-Controlled TTS Engine
voice100
-
Voice-cloning library for conlangs?
If you would like to use your romanization, yes, first you have to have some way to perform grapheme-to-phoneme transcription. I dug for a bit and found something that looks pretty basic, where you can easily write your own phonemizer: https://github.com/kaiidams/voice100. Not sure how good this model is, as it's made to be working on small devices, but you may play with it.
What are some alternatives?
Cgml - GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.
FastSpeech2 - An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
TTS - ๐ธ๐ฌ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
draw-a-ui - Draw a mockup and generate html for it
WaveRNN - WaveRNN Vocoder + TTS
MockingBird - ๐AIๆๅฃฐ: 5็งๅ ๅ ้ๆจ็ๅฃฐ้ณๅนถ็ๆไปปๆ่ฏญ้ณๅ ๅฎน Clone a voice in 5 seconds to generate arbitrary speech in real-time
hifi-gan - HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
lhotse - Tools for handling speech data in machine learning projects.
wavegrad - A fast, high-quality neural vocoder.
clipea - ๐๐ข Like Clippy but for the CLI. A blazing fast AI helper for your command line
TensorFlowTTS - :stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)