Our great sponsors
-
kaldi-active-grammar
Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
"Everything other than talon has terrible latency": False! I develop kaldi-active-grammar (https://github.com/daanzu/kaldi-active-grammar), a free and open source speech recognition backend, which has extremely low latency. You can adjust how aggressive the VAD (voice activity detection) is to suit your preference, but the speech engine latency can be almost negligible, especially for voice commands (vs prose dictation). However, I agree that "most existing speech recognition engines were not designed with the kind of latency you want for quick one syllable commands", and that low latency is pivotal to being productive with voice commands. I also agree with your other points.
I’ve been messing around with https://github.com/ideasman42/nerd-dictation which, with the big model, gives surprisingly accurate local detections. Definitely more diy/hacker focused than actually being a solution though.
Now top that off with accents, like a Hispanic person, or regional slang. Deep Learning kits like https://github.com/FreddieAbad/Voice-Recognition-using-Deep-... are making headway but still far from general voice recognition
Now top that off with accents, like a Hispanic person, or regional slang. Deep Learning kits like https://github.com/FreddieAbad/Voice-Recognition-using-Deep-... are making headway but still far from general voice recognition
Thanks jiehong!
The reason that the hats are always present is that the way to code faster by voice than be keyboard is to speak fluently, minimising pauses, the way we speak regular human languages. If we had to say a command and then wait for the hats to appear, that would break the chain.
Re mapping, we use something called the "Command server", which allows us to use file-based RPC to run commands in VSCode. That way it is easy to send more complex commands, which are required by Cursorless
IntelliJ support is definitely one of the most requested features; once I'm done rewriting some of the core engine I'll probably take a swing at that. Here's the issue that tracks extracting cursorless into a node.js server so that it can be used by other editors: https://github.com/pokey/cursorless-vscode/issues/435