WhisperSpeech
RHVoice
WhisperSpeech | RHVoice | |
---|---|---|
5 | 13 | |
3,417 | 1,439 | |
4.7% | 1.9% | |
9.2 | 8.1 | |
7 days ago | 27 days ago | |
Jupyter Notebook | C++ | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WhisperSpeech
-
OpenVoice: Versatile Instant Voice Cloning
I haven't tried openvoice, but I did try whisperspeech and it will do the same thing. You can optionally pass in a file with a reference voice, and the tts uses it.
https://github.com/collabora/whisperspeech
I found it to be kind of creepy hearing it in my own voice. I also tried a friend of mine who had a french canadian accent and strangely the output didn't have his accent.
-
Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
- WhisperSpeech for the text-to-speech - https://github.com/collabora/WhisperSpeech
and an LLM (phi-2, Mistral, etc.) in between
-
WhisperFusion: Ultra-low latency conversations with an AI chatbot
Hi, I used the [WhisperSpeech](https://github.com/collabora/WhisperSpeech) model for the TTS part after I did some serious torch.compile optimizations to bring the latency down. The Whisper speech recognition and the LLM were optimized through TensorRT-LLM by Marcus and Vineet.
It's not perfect but I am still extremely proud of how it came out. :)
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
-
StyleTTS2 – open-source Eleven Labs quality Text To Speech
I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech
RHVoice
- StyleTTS2 – open-source Eleven Labs quality Text To Speech
-
⟳ 4 apps added, 28 updated at f-droid.org
RHVoice - a free and open source speech synthesize (version 1.8.0): TTS engine with extended languages support (incl. Russian)
-
Balacoon: python package for text-to-speech
Interesting. So some random questions - how easy is it to make a new voice? What about a new voice in a new language? - ever looked at SAPI? Is it possible to make a SAPI bridge for this on windows? - how does it fit with other systems. Like coqui and RHvoice? https://github.com/RHVoice/RHVoice
-
Extra voices for windows?
I like the voices from RHvoice https://rhvoice.org
-
Translate app with speech to text, text to speech?
I use this: https://rhvoice.org/
-
Major Text to Speech upgrades for 64 bit devices
I have tried RHVoice on Android, and it works okay.
-
TTS engine that allows me to add my own MSI files
try these and which works https://github.com/RHVoice/RHVoice
-
⟳ 2 apps added, 55 updated at f-droid.org
RHVoice - a free and open source speech synthesize (version 1.6.0): TTS engine with extended languages support (incl. Russian)
- Dicio: Free and open source voice assistant for Android
-
HERE WeGo apparently no longer supports voice navigation on devices without Google, and OSM isn't good enough in my area. Do I have any options besides switching to iOS, getting a standalone GPS, or using Google products?
Also, there are instructions on how to create new voices for RHVoice, if you're interested. It'll sound really unnatural, but at least it's not Google! https://github.com/RHVoice/RHVoice/wiki
What are some alternatives?
piper - A fast, local neural text to speech system
espeak-ng - eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
WhisperFusion - WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
whisper-ctranslate2 - Whisper command line client compatible with original OpenAI client based on CTranslate2.
TensorVox - Desktop application for neural speech synthesis written in C++
monotonic_align - Monotonic Alignment Search
NeMo - A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech in the Wild
TTS - :robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
luci - LuCI - OpenWrt Configuration Interface