-
tevr-asr-tool
State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.
-
DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
And for the true "Show HN" experience:
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00* > tevr_asr_tool-1.0.0-Linux-x86_64.zip
unzip tevr_asr_tool-1.0.0-Linux-x86_64.zip
sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb
tevr_asr_tool --target_file=test_audio.wav
I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.
If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.
If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.
But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.
[1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...
[2] https://github.com/kensho-technologies/pyctcdecode
[3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...
I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.
If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.
If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.
But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.
[1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...
[2] https://github.com/kensho-technologies/pyctcdecode
[3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...