espnet vs WhisperLive

WhisperLive

A nearly-live implementation of OpenAI's Whisper. (by collabora)

dictation obs openai text-to-speech Translation voice-recognition Whisper

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

espnet		WhisperLive
	Project
15	Mentions	4
7,892	Stars	1,180
1.3%	Growth	11.9%
10.0	Activity	9.4
1 day ago	Latest Commit	25 days ago
Python	Language	Python
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

espnet

Posts with mentions or reviews of espnet. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-17.

WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
9 projects | news.ycombinator.com | 17 Jan 2024

You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
[D] What's stopping you from working on speech and voice?
7 projects | /r/MachineLearning | 30 Jan 2023

- https://github.com/espnet/espnet
Íslensk talgervilsrödd sem hægt er að nota á Macca
1 project | /r/Iceland | 16 Dec 2022
High quality, fast performing, local text to speech generation
1 project | /r/LanguageTechnology | 30 Nov 2022

This link has instructions for doing this for a Japanese model. It would have to be altered to work with ljspeech and the fine tune dataset.
Text to speech generation
3 projects | dev.to | 29 Nov 2022

This work is made possible by the excellent advancements in text to speech modeling. ESPnet is a great project and should be checked out for more advanced and a wider range of use cases. This pipeline was also made possible by the great work from espnet_onnx in building a framework to export models to ONNX.
[P] TorToiSe - a true zero-shot multi-voice TTS engine
3 projects | /r/MachineLearning | 26 Apr 2022

CMU WavLab has ESPNet https://espnet.github.io/espnet/ which includes a number of high quality TTS models including VITS (which in my subjective experience is just as good as what is demonstrated here). Also the inference on various ESPNet pretrained TTS models is reasonable and sentences take on average 5 seconds per word to generate the waveform on my totally mid PC setup.
How to get Job in NLP?
1 project | /r/LanguageTechnology | 12 Dec 2021

The reason I'm saying this is to point out that having and in-depth knowledge on speech processing/generation requires a lot of information about signal processing and human speech in general (eg. acoustics and phonetics). However, if you're not into learning everything there is to know about a subject, just take one state-of-the-art example and study that as best as you can. Pick one environment/toolkit, for example espnet and simply go with that.
Help picking a good speech recognition library
3 projects | /r/learnpython | 1 Dec 2021

https://github.com/espnet/espnet (kind of like a newer Kaldi, but also not beginner friendly)
speechbrain VS espnet - a user suggested alternative
2 projects | 13 Oct 2021

both provide e2e ASR support but espnet does have more utilities where as speechbarain is clean
Need help with training ASR model from scratch.
3 projects | /r/speechtech | 26 Mar 2021

This is relatively small amount of speech to train the model from scratch, but you can train using another pre-trained model for initialization. There are numbers of end-to-end ASR toolkits which can be used for this: https://github.com/NVIDIA/NeMo and https://github.com/espnet/espnet

WhisperLive

Posts with mentions or reviews of WhisperLive. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-29.

Show HN: WhisperFusion – Ultra-low latency conversations with an AI chatbot
7 projects | news.ycombinator.com | 29 Jan 2024

Everything runs locally, we use:
- WhisperLive for the transcription - https://github.com/collabora/WhisperLive
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
9 projects | news.ycombinator.com | 17 Jan 2024

Check out WhisperLive: https://github.com/collabora/WhisperLive
If you're grappling with the slow march from cool tech demos to real-world language model apps, you might wanna check out WhisperLive. It's this rad open-source project that’s all about leveraging Whisper models for slick live transcription. Think real-time, on-the-fly translated captions for those global meetups. It's a neat example of practical, user-focused tech in action. Dive into the details on their GitHub page
Whisper: Nvidia RTX 4090 vs. M1 Pro with MLX
10 projects | news.ycombinator.com | 13 Dec 2023

https://github.com/collabora/WhisperLive
The is another one that uses huggingface's implementation, but I haven't tried it since my spec doesn't support flash-att2
Triple Threat: The Power of Transcription, Summary, and Translation
1 project | news.ycombinator.com | 3 Aug 2023

Curious to see how this works? Check out our demo page - https://col.la/transcription to generate your own transcription, summary, and translation, or use our browser extension - https://github.com/collabora/WhisperLive to get live transcriptions.

What are some alternatives?

When comparing espnet and WhisperLive you can also consider the following projects:

speechbrain - A PyTorch-based Speech Toolkit

cog-whisper-diarization - Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote

NeMo - A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

whisper-writer - 💬📝 A small dictation app using OpenAI's Whisper speech recognition model.

k2 - FSA/FST algorithms, differentiable, with PyTorch compatibility.

obs-zoom-and-follow - Dynamic zoom and mouse tracking script for OBS Studio

fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

gpt_chatbot - This chatbot lets you use your microphone to communicate with GPT-4. It uses the OpenAI text to speech to respond with a voice. It uses Pinecone to store long term information and retrieves it to create context. API keys for OpenAI and Pinecone required. Tested on Windows

kaldi-gstreamer-server - Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

whisper_streaming - Whisper realtime streaming for long speech-to-text transcription and translation

Kaldi Speech Recognition Toolkit - kaldi-asr/kaldi is the official location of the Kaldi project.

gpt-voice-conversation-chatbot - Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while giving you the option to let it remember things discussed.

espnet vs speechbrain WhisperLive vs cog-whisper-diarization espnet vs NeMo WhisperLive vs whisper-writer espnet vs k2 WhisperLive vs obs-zoom-and-follow espnet vs fairseq WhisperLive vs gpt_chatbot espnet vs kaldi-gstreamer-server WhisperLive vs whisper_streaming espnet vs Kaldi Speech Recognition Toolkit WhisperLive vs gpt-voice-conversation-chatbot

Compare espnet vs WhisperLive and see what are their differences.

espnet

WhisperLive

espnet

WhisperLive

What are some alternatives?