WhisperSpeech
vocode-python
WhisperSpeech | vocode-python | |
---|---|---|
5 | 9 | |
3,417 | 2,330 | |
4.7% | 4.8% | |
9.2 | 9.1 | |
7 days ago | 10 days ago | |
Jupyter Notebook | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WhisperSpeech
-
OpenVoice: Versatile Instant Voice Cloning
I haven't tried openvoice, but I did try whisperspeech and it will do the same thing. You can optionally pass in a file with a reference voice, and the tts uses it.
https://github.com/collabora/whisperspeech
I found it to be kind of creepy hearing it in my own voice. I also tried a friend of mine who had a french canadian accent and strangely the output didn't have his accent.
-
Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
- WhisperSpeech for the text-to-speech - https://github.com/collabora/WhisperSpeech
and an LLM (phi-2, Mistral, etc.) in between
-
WhisperFusion: Ultra-low latency conversations with an AI chatbot
Hi, I used the [WhisperSpeech](https://github.com/collabora/WhisperSpeech) model for the TTS part after I did some serious torch.compile optimizations to bring the latency down. The Whisper speech recognition and the LLM were optimized through TensorRT-LLM by Marcus and Vineet.
It's not perfect but I am still extremely proud of how it came out. :)
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
-
StyleTTS2 – open-source Eleven Labs quality Text To Speech
I think you’re talking about just using Whisper to annotate audio for a TTS pipeline but someone from Collabora actually created a TTS model directly from Whisper embeddings https://github.com/collabora/WhisperSpeech
vocode-python
- Launch HN: Retell AI (YC W24) – Conversational Speech API for Your LLM
-
Ask HN: Who is hiring? (February 2024)
Vocode || Engineering (multiple roles) || SF/Remote || Full-time/Contract || https://vocode.dev
- Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
-
April 2023
Vocode–an open source library for building LLM applications you can talk to. (https://github.com/vocodedev/vocode-python)
-
Serverless voice chat with Vicuna-13B
Coqui also looks interesting.
https://github.com/coqui-ai/TTS
Support for it was recently added to vocode:
https://github.com/vocodedev/vocode-python/pull/56
-
Vocode is an open source library that makes it easy to build voice-based LLM apps
Direct link to the code: https://github.com/vocodedev/vocode-python
-
Show HN: Vocode (YC W23) Is Back with an April Fools Special – PrankGPT
Hey everyone! We are so grateful for the warm reception from our launch this week.
We're back with PrankGPT (origin story of Vocode), rebuilt using our library https://github.com/vocodedev/vocode-python
Source code for the backend is public and available on replit to check it out
-
Show HN: YakGPT – A locally running, hands-free ChatGPT UI
Given that Vocode (realtime audio, llm, etc) came out a few days ago, could you speak to how yours compares to it?
https://github.com/vocodedev/vocode-python
-
Gen Z GPT hotline demo
It's a demo for their new open source library integrating several AI tools: https://github.com/vocodedev/vocode-python
What are some alternatives?
piper - A fast, local neural text to speech system
bark - 🔊 Text-Prompted Generative Audio Model
WhisperFusion - WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
Flowise - Drag & drop UI to build your customized LLM flow
whisper-ctranslate2 - Whisper command line client compatible with original OpenAI client based on CTranslate2.
PentestGPT - A GPT-empowered penetration testing tool
monotonic_align - Monotonic Alignment Search
ChatGPT-Next-Web - A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
VoiceCraft - Zero-Shot Speech Editing and Text-to-Speech in the Wild
textSQL
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
prompt-engineering - ChatGPT Prompt Engineering for Developers - deeplearning.ai