Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 14 Kaldi Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
react-transcript-editor
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs. - Work in progress
-
kaldi-active-grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
-
vosk-browser
A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Amazon plans to charge for Alexa in June–unless internal conflict delays revamp | news.ycombinator.com | 2024-01-20Yeah, whisper is the closest thing we have, but even it requires more processing power than is present in most of these edge devices in order to feel smooth. I've started a voice interface project on a Raspberry Pi 4, and it takes about 3 seconds to produce a result. That's impressive, but not fast enough for Alexa.
From what I gather a Pi 5 can do it in 1.5 seconds, which is closer, so I suspect it's only a matter of time before we do have fully local STT running directly on speakers.
> Probably anathema to the space, but if the devices leaned into the ~five tasks people use them for (timers, weather, todo list?) could probably tighten up the AI models to be more accurate and/or resource efficient.
Yes, this is the approach taken by a lot of streaming STT systems, like Kaldi [0]. Rather than use a fully capable model, you train a specialized one that knows what kinds of things people are likely to say to it.
[0] http://kaldi-asr.org/
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17You might check out this list from espnet. They list the different corpuses they use to train their models sorted by language and task (ASR, TTS etc):
https://github.com/espnet/espnet/blob/master/egs2/README.md
Project mention: Infini-Gram: Scaling unbounded n-gram language models to a trillion tokens | news.ycombinator.com | 2024-05-05
Project mention: Ask HN: How do you get started with adding voice commands to a computer system? | news.ycombinator.com | 2023-11-21https://github.com/dictation-toolbox/dragonfly
https://github.com/daanzu/kaldi-active-grammar
Kaldi related posts
-
Amazon plans to charge for Alexa in June–unless internal conflict delays revamp
-
Unsupervised (Semi-Supervised) ASR/STT training recipes
-
Steve's Explanation of the Viterbi Algorithm
-
add a TTS (text-to-speach) and ASR (automatic-speech-recognition) capabilities to obscure language?
-
C++ for machine learning
-
Íslensk talgervilsrödd sem hægt er að nota á Macca
-
The Advantages and disadvantages of In-House Speech Acknowledgment
-
A note from our sponsor - InfluxDB
www.influxdata.com | 21 May 2024
Index
What are some of the best open-source Kaldi projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Kaldi Speech Recognition Toolkit | 13,788 |
2 | espnet | 7,932 |
3 | vosk-api | 7,149 |
4 | Dragonfire | 1,382 |
5 | pykaldi | 979 |
6 | lhotse | 869 |
7 | vosk-server | 848 |
8 | vosk-android-demo | 685 |
9 | react-transcript-editor | 536 |
10 | kaldi-active-grammar | 329 |
11 | vosk-browser | 336 |
12 | docker-kaldi-gstreamer-server | 288 |
13 | vosk-build-model | 61 |
14 | ovos-stt-plugin-vosk | 14 |
Sponsored