NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

tortoise-tts

144 11,755 8.2 Jupyter Notebook

A multi-voice TTS system trained with an emphasis on quality

> I wonder if a style-transfer style algorithm could be used to map the intent of a sentence to a simulated voice.
There's definitely research/proprietary software that can enable a person speaking in desired manner to have their voice control the expression of the generated speech.
Here's a related issue on a Open Source text to speech project which I only learned of today: https://github.com/neonbjb/tortoise-tts/issues/34#issue-1229...
> I tend to view most of these things through the perspective of what would help mod-maker's for video games
Yeah, I think there's some really cool potential for indie creatives to have access to (even lower quality) voice simulation--for use in everything from the initial writing process (I find it quite interesting how engaging it is to hear one's words if that's going to be the final form--and even synthesis artifacts can prompt an emotion or thought to develop); to placeholder audio; and, even final audio in some cases.
> (and I suspect various open source voice sample sets would become pretty popular).
That's definitely a powerful enabler for Free/Open Source speech systems. There's a list of current data sets for speech at the "Open Speech and Language Resources" site: https://openslr.org/resources.php
Encouraging people to provide their voice for Public Domain/Open Source use does come with some ethical aspects that I think people need to be made aware of so they can make informed decisions about it.
Given your interest in this topic you might be interested in this (rough) tool I finally released last week: https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...

larynx

18 788 0.0 Python

Discontinued End to end text to speech system using gruut and onnx

I imagine that our concept of what a villain sounds like tends to be extremely personally biased but here's a couple of options [Advisory: Contains threatening language.]:
* http://www.sndup.net/p33q
* http://www.sndup.net/sppn
I created these samples in a relatively short time using the Free/Open Source (which I think is an important factor for indies) text-to-speech project Larynx & an narrative editor I finally released the other weekend:
* https://github.com/rhasspy/larynx/
* https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...
Now, I would really like to link you directly to audio of the next two but considering it's currently in beta behind an (automated response) email address, I think that may not be appropriate, so, instead...
* Visit & get access to the beta here: https://mycroft.ai/blog/mimic-3-preview/
* Copy & paste this SSML into the form: https://pastebin.com/Bwd7LCbj
It's definitely a noticeable step up again in quality.
There's an alternate pair of voices if you move the "_" from one "name" attribute to the other in each "voice" element.
I intentionally didn't edit the text to remove some of the artifacts both to give a realistic impression of the current state & because sometimes they add interesting texture. :)
Note the beta voices are "low" quality.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
TTS

231 29,174 9.5 Python

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

I agree so much, that I've started learning ML to make a decent opensource many-languages TTS working on smartphones.
But really, the situation is pretty good, with a lot of code and dataset available as opensource. Notably, if you're not constrained to smartphones and the like, you can run on your computer quite a number of modern models, see for instance https://github.com/coqui-ai/TTS/ (which itself contains many different models).
The work that needs to be done is """just""" to turn those models into something suitable for smartphones (which will most likely include re-training), and to plug them back into Android's TTS API.

opentts

10 822 1.3 Python

Open Text to Speech Server

If you've not already encountered them I'd definitely encourage you to check out these Free/Open Source projects too:
* Larynx: https://github.com/rhasspy/larynx/
* OpenTTS: https://github.com/synesthesiam/opentts
* Likely Mimic3 in the near future: https://mycroft.ai/blog/mimic-3-preview/
Larynx in particular has a focus on "faster than real-time" while OpenTTS is an attempt to package & provide common REST API to all Free/Open Source Text To Speech systems so the FLOSS ecosystem can build on previous work supported by short-lived business interests, rather than start from scratch every time.
AIUI the developer of the first two projects now works for Mycroft AI & is involved in the development of Mimic3 which seems very promising given how much of an impact on quality his solo work has had in just the past couple of years or so.

Thorsten-Voice

1 479 6.4 Python

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

For german users, I can recommend to take a look at
https://www.thorsten-voice.de/
https://github.com/thorstenMueller/Thorsten-Voice
where someone contributed a huge set of his voice samples and a tutorial / script collection to build a pretty decent TTS model LOCALLY.
Quality-wise it is not that good, but its free and pretty easy to follow for a tech enthusiast.

TensorFlowTTS

6 3,697 0.0 Python

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

I had a lot of success using [FastSpeech2 + MB MelGAN via TensorFlowTTS](https://github.com/TensorSpeech/TensorFlowTTS). There are demos for [iOS](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/ex...) and [Android](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/ex...) which will allow you to run pretty convincing, modern TTS models with only a few hundred milliseconds of processing latency.

vosk-api

59 7,025 5.9 Jupyter Notebook

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

In case it's of interest, when I last explored this topic in terms of the Free/Open Source ecosystem I was very impressed with how well VOSK-API performed: https://github.com/alphacep/vosk-api
Here's another project that builds on top of VOSK to provide a tighter integration with Linux: https://github.com/ideasman42/nerd-dictation

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nerd-dictation

28 1,158 3.6 Python

Simple, hackable offline speech to text - using the VOSK-API.

In case it's of interest, when I last explored this topic in terms of the Free/Open Source ecosystem I was very impressed with how well VOSK-API performed: https://github.com/alphacep/vosk-api
Here's another project that builds on top of VOSK to provide a tighter integration with Linux: https://github.com/ideasman42/nerd-dictation

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[P] Making a TTS voice, HK-47 from Kotor using Tortoise (Ideally WaveRNN)
2 projects | /r/MachineLearning | 6 Jul 2023
WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper
9 projects | news.ycombinator.com | 17 Jan 2024
[D] TTS systems to download & run offline
3 projects | /r/MachineLearning | 14 May 2023
AI-genereeritud Politseikroonika
1 project | /r/Eesti | 17 Apr 2023
Making Voices For System Members
2 projects | /r/plural | 16 Apr 2023

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-synthesis Tts Python text-to-speech Deep Learning
Post date: 17 May 2022

tortoise-tts

larynx

WorkOS

TTS

opentts

Thorsten-Voice

TensorFlowTTS

vosk-api

InfluxDB

nerd-dictation

Related posts

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com speech-synthesis Tts Python text-to-speech Deep Learning Post date: 17 May 2022

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
speech-synthesis Tts Python text-to-speech Deep Learning
Post date: 17 May 2022