Our great sponsors
-
WhisperFusion
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Automatically take a screenshot and feed it to https://github.com/vikhyat/moondream or similar? Doable. But while very impressive, the results are a bit of mixed bag (some hallucinations)
Oh this is neat! I was wondering how to get whisper to stream-transcribe well. I have a similar project using whisper + styletts with the similar goal to gave minimal delay: https://github.com/lxe/llm-companion
Everything runs locally, we use:
- WhisperLive for the transcription - https://github.com/collabora/WhisperLive
- WhisperSpeech for the text-to-speech - https://github.com/collabora/WhisperSpeech
and an LLM (phi-2, Mistral, etc.) in between
The code is all released already. You find it here: https://github.com/rwth-i6/returnn-experiments/tree/master/2...
This is TensorFlow-based. But I also have another PyTorch-based implementation already, also public (inside our other repo, i6_experiments). It's not so easy currently to set this up, but I'm working on a simpler pipeline in PyTorch.
We don't have the models online yet, but we can upload them later. But I'm not sure how useful they are outside of research, as they are specifically for those research tasks (Librispeech, Tedlium), and probably don't perform too well on other data.
Related posts
- WhisperFusion: Ultra-low latency conversations with an AI chatbot
- WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
- Microsoft releases Windows AI studio to run and fine tune models locally
- Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX
- [D] What offline TTS Model is good enough for a realistic real-time task?