Show HN: State-of-the-Art German Speech Recognition in 284 lines of C++

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • tevr-asr-tool

    State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

  • And for the true "Show HN" experience:

    wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."

    wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."

    wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."

    wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."

    wget "https://github.com/DeutscheKI/tevr-asr-tool/releases/downloa..."

    cat tevr_asr_tool-1.0.0-Linux-x86_64.zip.00* > tevr_asr_tool-1.0.0-Linux-x86_64.zip

    unzip tevr_asr_tool-1.0.0-Linux-x86_64.zip

    sudo dpkg -i tevr_asr_tool-1.0.0-Linux-x86_64.deb

    tevr_asr_tool --target_file=test_audio.wav

  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.

    If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.

    If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.

    But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.

    [1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

    [2] https://github.com/kensho-technologies/pyctcdecode

    [3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pyctcdecode

    A fast and lightweight python-based CTC beam search decoder for speech recognition.

  • I wrote "284 lines of C++" to indicate that this is compact enough for people to actually read and understand the source code. Also, compiling my implementation is super easy and straightforward ... something which can't be said for Kaldi, Vosk, or DeepSpeech.

    If you try to read the CTC beam search decoder from Mozilla's DeepSpeech [1], that alone is about 2000 LOC in multiple files.

    If you try to read the pyctcdecode source that is used by HuggingFace [2], that's 1000+ LOC of Python.

    But this implementation is all the client-side, i.e. the entire "native_client" folder hierarchy in DeepSpeech [3], narrowed down to a mere 284 lines.

    [1] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

    [2] https://github.com/kensho-technologies/pyctcdecode

    [3] https://github.com/mozilla/DeepSpeech/tree/master/native_cli...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Ask HN: Speech to text models, are they usable yet?

    2 projects | news.ycombinator.com | 22 Oct 2023
  • Speech-to-Text in Real Time

    1 project | news.ycombinator.com | 16 Jul 2023
  • Linux Mint XFCE

    1 project | /r/linuxbrasil | 29 Apr 2023
  • Are there any secure and free auto transcription software ?

    2 projects | /r/software | 19 Apr 2023
  • Deepspeech /common voice.

    1 project | /r/mozilla | 14 Apr 2023