Our great sponsors
-
Since you know that you have one or two phonemes in each recordings (one for vowel, two for a consonant) you will be able to find where on the recordings the utterances takes place. Which is a simplified approach of "forced alignment".
-
And for phonemes recognition: - this looks like it could be useful (I'm sure you won't mind if it's "phones" instead of "phonemes"): https://github.com/xinjli/allosaurus - about using standard speech recognition tools: https://cmusphinx.github.io/wiki/phonemerecognition/
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
SpeechRecognition
Speech recognition module for Python, supporting several engines and APIs, online and offline.
I’m not who you replied to but I saw the Sphinx integration has a keyword recognizer api: https://github.com/Uberi/speech_recognition/blob/master/examples/special_recognizer_features.py
-
common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Check Mozilla's common voice. It's a great project, it's easy to participate and easy to use the data. (BTW they've also released DeepSpeech for speech recognition.)