Is it possible to self host a voice assistant?

This page summarizes the projects mentioned and recommended in the original post on /r/selfhosted

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

  • You mean like https://mycroft.ai/ ?

  • rhasspy-mobile-app

    A simple mobile app for rhasspy.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • karen-app

    This is the app for Karen, an open-source voice assistant.

  • selene-backend

    Microservices and web apps to support Mycroft devices

  • Porcupine  

    On-device wake word detection powered by deep learning

  • Consider looking at https://github.com/mozilla/DeepSpeech , you can get pre-compiled versions of it and there are versions that will run on a Raspberry Pi, and yes... it's all local, but your mileage may vary. And there is also https://picovoice.ai/ they run stuff locally on the machine, but again each use a constrained local language model/syntax. The other real question, as https://www.reddit.com/user/eduncan911/ correctly states is the use of the wake-word... most systems process all sound, ie. are listening all the time, a number of the Alexa or Google Assistants or similar approaches embed a smaller model or use hardware/neural networks to recognize the wake-word before passing on sound to further syntax processing, so think of most of these devices as always listening and processing and you'd be right, so factor that into power usage etc.

  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • Consider looking at https://github.com/mozilla/DeepSpeech , you can get pre-compiled versions of it and there are versions that will run on a Raspberry Pi, and yes... it's all local, but your mileage may vary. And there is also https://picovoice.ai/ they run stuff locally on the machine, but again each use a constrained local language model/syntax. The other real question, as https://www.reddit.com/user/eduncan911/ correctly states is the use of the wake-word... most systems process all sound, ie. are listening all the time, a number of the Alexa or Google Assistants or similar approaches embed a smaller model or use hardware/neural networks to recognize the wake-word before passing on sound to further syntax processing, so think of most of these devices as always listening and processing and you'd be right, so factor that into power usage etc.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts