WhisperSpeech
OpenVoice
WhisperSpeech | OpenVoice | |
---|---|---|
5 | 14 | |
3,417 | 24,948 | |
4.7% | 34.1% | |
9.2 | 8.8 | |
7 days ago | 11 days ago | |
Jupyter Notebook | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WhisperSpeech
-
OpenVoice: Versatile Instant Voice Cloning
I haven't tried openvoice, but I did try whisperspeech and it will do the same thing. You can optionally pass in a file with a reference voice, and the tts uses it.
https://github.com/collabora/whisperspeech
I found it to be kind of creepy hearing it in my own voice. I also tried a friend of mine who had a french canadian accent and strangely the output didn't have his accent.
-
Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
- WhisperSpeech for the text-to-speech - https://github.com/collabora/WhisperSpeech
and an LLM (phi-2, Mistral, etc.) in between
-
WhisperFusion: Ultra-low latency conversations with an AI chatbot
Hi, I used the [WhisperSpeech](https://github.com/collabora/WhisperSpeech) model for the TTS part after I did some serious torch.compile optimizations to bring the latency down. The Whisper speech recognition and the LLM were optimized through TensorRT-LLM by Marcus and Vineet.
It's not perfect but I am still extremely proud of how it came out. :)
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
-
StyleTTS2 – open-source Eleven Labs quality Text To Speech
I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech
OpenVoice
- OpenVoice: Instant Voice Cloning
- OpenVoice V2 Released
-
Ask HN: Voice ID adoption at financial institutions
Given the inevitability of easy voice cloning[1], it seems irresponsible to be using voice as a positive authentication signal.
Unfortunately, major US financial institutions seem to be ramping up adoption of this technology[2].
Am I missing something?
[1] https://github.com/myshell-ai/OpenVoice
-
OpenAI: Navigating the Challenges and Opportunities of Synthetic Voices
They might have been forced to give a signal after this rose on HN today:
https://research.myshell.ai/open-voice
https://news.ycombinator.com/item?id=39861578
- OpenVoice: Versatile Instant Voice Cloning
- FLaNK Weekly 31 December 2023
What are some alternatives?
piper - A fast, local neural text to speech system
tortoise-tts - A multi-voice TTS system trained with an emphasis on quality
WhisperFusion - WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
whisper-ctranslate2 - Whisper command line client compatible with original OpenAI client based on CTranslate2.
FLiPStackWeekly - FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
monotonic_align - Monotonic Alignment Search
Stirling-PDF - #1 Locally hosted web application that allows you to perform various operations on PDF files
VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech in the Wild
FLaNK-Ice - Apache Iceberg - Cloud Data Lakehouse
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
JavaOnRaspberryPi - Sources and scripts for the book "Getting started with Java on the Raspberry Pi"