Voice2json: Offline speech and intent recognition on Linux

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • common-voice

    Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

  • I like the idea, and decided to try doing some validation. The first thing I noticed is that it asks me to make a yes-or-no judgment of whether the sentence was spoken "accurately", but nowhere on the site is it explained what "accurate" means, or how strict I should be.

    (The first clip I got was spoken more or less correctly, but a couple of words are slurred together and the prosody is awkward. Without having a good idea of the standards and goals of the project, I have no idea whether including this clip would make the overall dataset better or worse. My gut feeling is that it's good for training recognition, and bad for training synthesis.)

    This seems to me like a major issue, since it should take a relatively small amount of effort to write up a list of guidelines, and it would be hugely beneficial to establish those guidelines before asking a lot of volunteers to donate their time. I don't find it encouraging that this has been an open issue for four years, with apparently no action except a bunch of bikeshedding: https://github.com/common-voice/common-voice/issues/273

  • TTS

    πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

  • I'd check out coqui https://coqui.ai/

    It's well-documented and works basically out of box. I wish the STT models bundled were closer to the quality of Kaldi but the ease-of-use has no comparisons.

    And maybe with time it will surpass Kaldi in quality too.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • vosk-server

    WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

  • julius

    Open-Source Large Vocabulary Continuous Speech Recognition Engine (by julius-speech)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts