Our great sponsors
-
determined
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
-
snowboy
Future versions with model training module will be maintained through a forked version here: https://github.com/seasalt-ai/snowboy
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Spoken-Keyword-Spotting
In this repository, we explore using a hybrid system consisting of a Convolutional Neural Network and a Support Vector Machine for Keyword Spotting task.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Check out Determined https://github.com/determined-ai/determined to help manage this kind of work at scale: Determined leverages Horovod under the hood, automatically manages cloud resources and can get you up on spot instances, T4's, etc. and will work on your local cluster as well. Gives you additional features like experiment management, scheduling, profiling, model registry, advanced hyperparameter tuning, etc.
Full disclosure: I'm a founder of the project.
Great question. This is technically referred to as "Wake Word Detection". You run a really small model locally that is just processing 500ms (for example) of audio at a time through a light weight CNN or RNN. The idea here is that it's just binary classification (vs actual speech recognition).
There are some open source libraries that make this relatively easy:
- https://github.com/Kitt-AI/snowboy (looks to be shutdown now)
- https://github.com/cmusphinx/pocketsphinx
This avoids having to stream audio 24x7 to a cloud model which would be super expensive. This being said, I'm pretty sure what the Alexa does, for example, is send any positive wake word to a cloud model (that is bigger and more accurate) to verify the prediction of the local wake word detection model AFAIK.
The search term you're looking for is "Keyword Spotting" - and that's what's implemented locally for ~embedded devices that sit and wait for something relevant to come along so that they know when to start sending data up to the mothership (or even turn on additional higher-power cores locally).
Here's an example repo that might be interesting (from initial impressions, though there are many more out there) : https://github.com/vineeths96/Spoken-Keyword-Spotting