lhotse
Tools for handling speech data in machine learning projects. (by lhotse-speech)
whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence (by linto-ai)
lhotse | whisper-timestamped | |
---|---|---|
1 | 2 | |
866 | 1,547 | |
4.5% | 7.4% | |
9.0 | 8.1 | |
2 days ago | 18 days ago | |
Python | Python | |
Apache License 2.0 | GNU Affero General Public License v3.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lhotse
Posts with mentions or reviews of lhotse.
We have used some of these posts to build our list of alternatives
and similar projects.
whisper-timestamped
Posts with mentions or reviews of whisper-timestamped.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2023-11-01.
-
Show HN: AI Dub Tool I Made to Watch Foreign Language Videos with My 7-Year-Old
Yes. But Whisper's word-level timings are actually quite inaccurate out of the box. There are some Python libraries that mitigate that. I tested several of them. whisper-timestamped seems to be the best one. [0]
[0] https://github.com/linto-ai/whisper-timestamped
-
AI-assisted removal of filler words from video recordings
whisper-timestamped, which is a layer on top of the Whisper set of models enabling us to get accurate word timestamps and include filler words in transcription output. This transcriber downloads the selected Whisper model to the machine running the demo and no third-party API keys are required.
What are some alternatives?
When comparing lhotse and whisper-timestamped you can also consider the following projects:
pykaldi - A Python wrapper for Kaldi
pywhisper - openai/whisper + extra features
comcrawl - A python utility for downloading Common Crawl data
wav2vec - pure numpy implementation of wav2vec 2.0
EmotiVoice - EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
pyannote-whisper
thunder-speech - A Hackable speech recognition library.
SincNet - SincNet is a neural architecture for efficiently processing raw audio samples.
SALMONN - SALMONN: Speech Audio Language Music Open Neural Network
FFmpeg - Mirror of https://git.ffmpeg.org/ffmpeg.git
zeta - Build high-performance AI models with modular building blocks
filler-word-removal
lhotse vs pykaldi
whisper-timestamped vs pywhisper
lhotse vs comcrawl
whisper-timestamped vs wav2vec
lhotse vs EmotiVoice
whisper-timestamped vs pyannote-whisper
lhotse vs thunder-speech
whisper-timestamped vs SincNet
lhotse vs SALMONN
whisper-timestamped vs FFmpeg
whisper-timestamped vs zeta
whisper-timestamped vs filler-word-removal