Voice2json: Offline speech and intent recognition on Linux

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

common-voice

66 3,247 10.0 TypeScript

Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

I like the idea, and decided to try doing some validation. The first thing I noticed is that it asks me to make a yes-or-no judgment of whether the sentence was spoken "accurately", but nowhere on the site is it explained what "accurate" means, or how strict I should be.
(The first clip I got was spoken more or less correctly, but a couple of words are slurred together and the prosody is awkward. Without having a good idea of the standards and goals of the project, I have no idea whether including this clip would make the overall dataset better or worse. My gut feeling is that it's good for training recognition, and bad for training synthesis.)
This seems to me like a major issue, since it should take a relatively small amount of effort to write up a list of guidelines, and it would be hugely beneficial to establish those guidelines before asking a lot of volunteers to donate their time. I don't find it encouraging that this has been an open issue for four years, with apparently no action except a bunch of bikeshedding: https://github.com/common-voice/common-voice/issues/273

TTS

231 29,174 9.5 Python

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

I'd check out coqui https://coqui.ai/
It's well-documented and works basically out of box. I wish the STT models bundled were closer to the quality of Kaldi but the ease-of-use has no comparisons.
And maybe with time it will surpass Kaldi in quality too.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
vosk-server

4 837 5.5 Python

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
julius

1 1,773 0.0 C

Open-Source Large Vocabulary Continuous Speech Recognition Engine (by julius-speech)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project