Nerd-dictation, hackable speech to text on Linux

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • vosk-api

    Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

  • Yeah, I was really impressed with the project when I encountered it last year when trying out a bunch of FLOSS Speech-To-Text options.

    It was significantly better than the other FLOSS options I looked at--both in terms of getting it going initially & the quality of the speech to text results.

    I tested it with a lightly modified version of this example script: https://github.com/alphacep/vosk-api/blob/master/python/exam...

    What I found particularly interesting was when you have the "partial" recognition output shown in real-time you get to see how--at the end of a sentence--it may change a word earlier in the sentence in the final recognition output based on (I guess) the additional context of the full sentence.

    (I just did a quick test again (with the installs from my testing last year) using an internal laptop microphone & the test script recognized a significant chunk of my speech (using a headset definitely improves things though) whereas with the same environment a test with `mic_vad_streaming` (from `DeepSpeech-examples-r0.9` with `deepspeech-0.9.0-models.pbmm`) failed to recognize any words at all.)

  • nerd-dictation

    Simple, hackable offline speech to text - using the VOSK-API.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dicio-android

    Dicio assistant app for Android

  • Kaldi Speech Recognition Toolkit

    kaldi-asr/kaldi is the official location of the Kaldi project.

  • Vosk-api isn't an SST engine itself, it is built using the Kaldi speech recognition toolkit (https://github.com/kaldi-asr/kaldi) and nicely implements and packages an API for Kaldi chain/LF-MMI models.

  • larynx

    Discontinued End to end text to speech system using gruut and onnx

  • Yes!

    The project is called Larynx, and it is amazing: https://github.com/rhasspy/larynx/

    I waxed lyrical about it recently in this thread about private alternatives to Alexa: https://news.ycombinator.com/item?id=29562526

    I can only vouch for the quality/variety in English but it does note support for 50 voices over 9 languages, including all the first group of languages you mentioned, and also Russian. (I've "played" with all those languages to test them but can't really vouch for how a native speaker/listener might find it. :D )

    It is miles ahead of any of the other Free/Open Source TTS solutions I've tried, including the ones you mentioned.

    (It's still synthesized speech but the output quality is so good and the project is still extremely early days.)

    And there's a range of options in accent & gender--which are in general sorely lacking in other FLOSS TTS options. (In terms of licensing, some voices are licensed more freely than others but the majority are without significant restriction.)

    I like Larynx so much that I've been working on an editor for it to assist in "auditioning" & recording speech in a narrative context, e.g. game/film pre-viz.

  • PeerTube

    ActivityPub-federated video streaming platform using P2P directly in your web browser

  • I just checked and apparently they are already aware: https://github.com/Chocobozzz/PeerTube/issues/3325#issuecomm... :)

    (Tho I'll admit I have no idea what "bluffing" means in that context. :D )

  • recasepunc

    Model for recasing and repunctuating ASR transcripts

  • Was just about to mention this repo to the OP but suspect I found it from your site in the first place: https://github.com/benob/recasepunc :D

    Punctuation/capitalization will make a massive difference to practical use! Look forward to it.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts