Web Speech API is (still) broken on Linux circa 2023

This page summarizes the projects mentioned and recommended in the original post on /r/javascript

Our great sponsors
  • SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • native-messaging-espeak-ng

    Native Messaging => eSpeak NG => MediaStreamTrack

  • I already implemented direct connection to a local speech synthesis engine that processes SSML input, provides a means to pause and resume the audio output because we can play the file using HTMLMediaElement in the browser, and creates a MediaStreamTrack from the parsed WAV for the ability to record and share the stream with peers - to prove the requirement is possible https://github.com/guest271314/native-messaging-espeak-ng.

  • SSMLParser

    Implement SSML parsing for Web Speech API

  • Or, as you noted, split the synthesis over multiple SpeechSynthesisUtterance instances in a user-defined queue calling speak() in succession. That's what I do here https://guest271314.github.io/SSMLParser/.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • GoogleNetworkSpeechSynthesis

    Google's Network Speech Synthesis: Bring your own Google API key and proxy

  • This is how you can make the request yourself GoogleNetworkSpeechSynthesis.

  • TTS

    :robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts) (by mozilla)

  • There is a lot of TTS and SST development going on (https://github.com/mozilla/TTS; https://github.com/mozilla/DeepSpeech; https://github.com/common-voice/common-voice). That is the only way they work: Contributions from the wild.

  • DeepSpeech

    DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

  • There is a lot of TTS and SST development going on (https://github.com/mozilla/TTS; https://github.com/mozilla/DeepSpeech; https://github.com/common-voice/common-voice). That is the only way they work: Contributions from the wild.

  • common-voice

    Common Voice is part of Mozilla's initiative to help teach machines how real people speak.

  • There is a lot of TTS and SST development going on (https://github.com/mozilla/TTS; https://github.com/mozilla/DeepSpeech; https://github.com/common-voice/common-voice). That is the only way they work: Contributions from the wild.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts