Speech Synthesis on Linux

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TTS

231 28,959 9.5 Python

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
larynx

18 788 0.0 Python

Discontinued End to end text to speech system using gruut and onnx

Thanks! Try the --raw-stream option for listening to long texts: https://github.com/rhasspy/larynx#long-texts
For speech-dispatcher, I'd start a Larynx HTTP server and use curl to get audio. I have an undocumented --daemon flag that does something like this.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
tacotron2

28 4,882 0.0 Jupyter Notebook

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
TTS

62 8,784 0.0 Jupyter Notebook

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts) (by mozilla)
Real-Time-Voice-Cloning

96 50,652 0.0 Python

Clone a voice in 5 seconds to generate arbitrary speech in real-time
hmm_tts_build

1 2 1.8 Shell

a direct repository for building and using a "simple" tts

A nice enhancement for the system is having TTS read out the currently selected text, triggered by a key shortcut.
I tried festival and it too complicated and my version was too to run the better voices model.
Instead I've used this repo to use upgraded flite: https://github.com/kastnerkyle/hmm_tts_build/
I have mapped keyboard shortcuts Win+1 for normal speed, Win+2 for faster and Win+3 for really fast reading speed. I can use it while reading, to enhance my focus. Neat.

RHVoice

13 1,424 8.2 C++

a free and open source speech synthesizer for Russian and other languages
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
NeMo

29 10,021 9.8 Python

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

I worked with this a bit not that long ago. For cloud services, quality of Google and Azure "neural" voices are tough to beat. Interestingly I experienced significant latency for all of the Azure services regardless of region, configuration, etc. Never dug deep enough to figure out what was going on there. Also of note, Azure will also let you run their implementation on a local container with the usual "contact us" stuff. Not sure of the terms and pricing on that.
For local, Mozilla TTS was best from a quality standpoint but the GPU inference support was a bit dicey and (possibly) not really supported at all.
For more complex and bespoke applications the Nvidia (I know, I know) NeMO toolkit [0] is very powerful but requires more effort than most to get up and running. However, it provides the ability to do very interesting things with additional training and all things speech.
In the Nvidia world there's also their Riva [1] (formerly Jarvis) solution that works with Triton [2] to build out an architecture for extremely performant and high-scale speech applications with things like model management, revision control, deployment, etc.
[0] https://github.com/NVIDIA/NeMo
[1] https://developer.nvidia.com/riva
[2] https://developer.nvidia.com/nvidia-triton-inference-server

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project