Why, in 2022, is there no high quality method for voice control of a PC?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • kaldi-active-grammar

    Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time

  • "Everything other than talon has terrible latency": False! I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend, which has extremely low latency. You can adjust how aggressive the VAD (voice activity detection) is to suit your preference, but the speech engine latency can be almost negligible, especially for voice commands (vs prose dictation). However, I agree that "most existing speech recognition engines were not designed with the kind of latency you want for quick one syllable commands", and that low latency is pivotal to being productive with voice commands. I also agree with your other points.

  • cursorless-talon

    The cursor never loved you anyway (by cursorless-dev)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • nerd-dictation

    Simple, hackable offline speech to text - using the VOSK-API.

  • I’ve been messing around with https://github.com/ideasman42/nerd-dictation which, with the big model, gives surprisingly accurate local detections. Definitely more diy/hacker focused than actually being a solution though.

  • Voice-Recognition-using-Deep-Learning

    Voice Recognition using Deep Learning

  • Now top that off with accents, like a Hispanic person, or regional slang. Deep Learning kits like https://github.com/FreddieAbad/Voice-Recognition-using-Deep-... are making headway but still far from general voice recognition

  • Now top that off with accents, like a Hispanic person, or regional slang. Deep Learning kits like https://github.com/FreddieAbad/Voice-Recognition-using-Deep-... are making headway but still far from general voice recognition

  • cursorless

    Don't let the cursor slow you down

  • Thanks jiehong!

    The reason that the hats are always present is that the way to code faster by voice than be keyboard is to speak fluently, minimising pauses, the way we speak regular human languages. If we had to say a command and then wait for the hats to appear, that would break the chain.

    Re mapping, we use something called the "Command server", which allows us to use file-based RPC to run commands in VSCode. That way it is easy to send more complex commands, which are required by Cursorless

    IntelliJ support is definitely one of the most requested features; once I'm done rewriting some of the core engine I'll probably take a swing at that. Here's the issue that tracks extracting cursorless into a node.js server so that it can be used by other editors: https://github.com/pokey/cursorless-vscode/issues/435

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts