WhisperLive
LiteratureForEyesAndEars | WhisperLive | |
---|---|---|
1 | 4 | |
0 | 1,350 | |
- | 12.6% | |
4.7 | 9.4 | |
5 months ago | 7 days ago | |
Python | Python | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LiteratureForEyesAndEars
-
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
I have forced alignments, too.
E.g. for the True Story of Ah Q https://github.com/Yorwba/LiteratureForEyesAndEars/tree/mast... .align.json is my homegrown alignment format, .srt are standard subtitles, .txt is the text, but note that in some places I have [[original text||what it is pronounced as]] annotations to make the forced alignment work better. (E.g. the "." in LibriVox.org, pronounced as 點 "diǎn" in Mandarin.) Oh, and cmn-Hans is the same thing transliterated into Simplified Chinese.
The corresponding LibriVox URL is predictably https://librivox.org/the-true-story-of-ah-q-by-xun-lu/
WhisperLive
-
Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
Everything runs locally, we use:
- WhisperLive for the transcription - https://github.com/collabora/WhisperLive
-
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
Check out WhisperLive: https://github.com/collabora/WhisperLive
If you're grappling with the slow march from cool tech demos to real-world language model apps, you might wanna check out WhisperLive. It's this rad open-source project that’s all about leveraging Whisper models for slick live transcription. Think real-time, on-the-fly translated captions for those global meetups. It's a neat example of practical, user-focused tech in action. Dive into the details on their GitHub page
-
Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX
https://github.com/collabora/WhisperLive
The is another one that uses huggingface's implementation, but I haven't tried it since my spec doesn't support flash-att2
-
Triple Threat: The Power of Transcription, Summary, and Translation
Curious to see how this works? Check out our demo page - https://col.la/transcription to generate your own transcription, summary, and translation, or use our browser extension - https://github.com/collabora/WhisperLive to get live transcriptions.
What are some alternatives?
cog-whisper-diarization - Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote
whisper-writer - 💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
obs-zoom-and-follow - Dynamic zoom and mouse tracking script for OBS Studio
gpt_chatbot - This chatbot lets you use your microphone to communicate with GPT-4. It uses the OpenAI text to speech to respond with a voice. It uses Pinecone to store long term information and retrieves it to create context. API keys for OpenAI and Pinecone required. Tested on Windows
whisper_streaming - Whisper realtime streaming for long speech-to-text transcription and translation
gpt-voice-conversation-chatbot - Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while giving you the option to let it remember things discussed.
WhisperFusion - WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
PaddleSpeech - Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
LiveWhisper - A nearly-live implementation of OpenAI's Whisper, using sounddevice. Requires existing Whisper install.
mlx - MLX: An array framework for Apple silicon
spokestack-python - Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.
espnet - End-to-End Speech Processing Toolkit