Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

tevr-asr-tool

9 406 5.9 C

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

And for the true "Show HN" experience:
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."
cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00* > tevr_asr_tool-1.0.0-Linux-x86_64.zip
unzip tevr_asr_tool-1.0.0-Linux-x86_64.zip
sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb
tevr_asr_tool --target_file=test_audio.wav

DeepSpeech

68 24,380 0.0 C++

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.
If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.
If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.
But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.
[1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...
[2] https://github.com/kensho-technologies/pyctcdecode
[3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pyctcdecode

2 406 2.1 Python

A fast and lightweight python-based CTC beam search decoder for speech recognition.

I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.
If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.
If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.
But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.
[1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...
[2] https://github.com/kensho-technologies/pyctcdecode
[3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Ask HN: Speech to text models, are they usable yet?

2 projects | news.ycombinator.com | 22 Oct 2023
Speech-to-Text in Real Time

1 project | news.ycombinator.com | 16 Jul 2023
Linux Mint XFCE

1 project | /r/linuxbrasil | 29 Apr 2023
Are there any secure and free auto transcription software ?

2 projects | /r/software | 19 Apr 2023
Deepspeech /common voice.

1 project | /r/mozilla | 14 Apr 2023

Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning Tensorflow speech-recognition speech-to-text
Post date: 10 Aug 2022

tevr-asr-tool

DeepSpeech

InfluxDB

pyctcdecode

Related posts

Ask HN: Speech to text models, are they usable yet?

Speech-to-Text in Real Time

Linux Mint XFCE

Are there any secure and free auto transcription software ?

Deepspeech /common voice.

Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Deep Learning Machine Learning Tensorflow speech-recognition speech-to-text Post date: 10 Aug 2022

tevr-asr-tool

DeepSpeech

InfluxDB

pyctcdecode

Related posts

Ask HN: Speech to text models, are they usable yet?

Speech-to-Text in Real Time

Linux Mint XFCE

Are there any secure and free auto transcription software ?

Deepspeech /common voice.

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning Tensorflow speech-recognition speech-to-text
Post date: 10 Aug 2022