Pink-Trombone
floss-various-contribs
Pink-Trombone | floss-various-contribs | |
---|---|---|
2 | 2 | |
151 | - | |
- | - | |
5.2 | - | |
5 months ago | - | |
JavaScript | ||
GNU General Public License v3.0 only | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Pink-Trombone
-
ESpeak-ng: speech synthesizer with more than one hundred languages and accents
Too late to edit, but to any one who needs "convincing" of the flexibility of a formant synthesizer, you should 1) play with Pink Trombone[1], a Javascript formant synthesizer with a UI that lets you graphically manipulate a vocal tract, and 2) have a look at this programmable version of it[2]
[1] https://dood.al/pinktrombone/
[2] https://github.com/zakaton/Pink-Trombone
-
How to convert phonetic units into words by writing some code (general software approach)?
Can I do it automatically without any audio/voice recordings at all, such as taking advantage of the pink trombone linguistics tool, (for which there is some source code)?
floss-various-contribs
-
ESpeak-ng: speech synthesizer with more than one hundred languages and accents
Yeah, it would be nice if the financial backing behind Rhasspy/Piper led to improvements in espeak-ng too but based on my own development-related experience with the espeak-ng code base (related elsewhere in the thread) I suspect it would be significantly easier to extract the specific required text to phonemes functionality or (to a certain degree) reimplement it (or use a different project as a base[3]) than to more closely/fully integrate changes with espeak-ng itself[4]. :/
It seems Piper currently abstracts its phonemize-related functionality with a library[0] that currently makes use of a espeak-ng fork[1].
Unfortunately it also seems license-related issues may have an impact[2] on whether Piper continues to make use of espeak-ng.
For your specific example of handling 1984 as a year, my understanding is that espeak-ng can handle situations like that via parameters/configuration but in my experience there can be unexpected interactions between different configuration/API options[6].
[0] https://github.com/rhasspy/piper-phonemize
[1] https://github.com/rhasspy/espeak-ng
[2] https://github.com/rhasspy/piper-phonemize/issues/30#issueco...
[3] Previously I've made note of some potential options here: https://gitlab.com/RancidBacon/notes_public/-/blob/main/note...
[4] For example, as I note here[5] there's currently at least four different ways to access espeak-ng's phoneme-related functionality--and it seems that they all differ in their output, sometimes consistently and other times dependent on configuration (e.g. audio output mode, spoken punctuation) and probably also input. :/
[5] https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...
[6] For example, see my test cases for some other numeric-related configuration options here: https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...
What are some alternatives?
espeak-ng - eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
xVA-Synth - Machine learning based speech synthesis Electron app, with voices from specific characters from video games
web-speech-synthesis-and-recognition - Speech to Text and Text to Speech on a web browser
audioworklet-polyfill - 🔊 Polyfill AudioWorklet using the legacy ScriptProcessor API.