larynx-dialogue vs piper-phonemize

larynx-dialogue

By RancidBacon

Suggest topics

Source Code

Suggest alternative

Edit details

piper-phonemize

C++ library for converting text to phonemes for Piper (by rhasspy)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

larynx-dialogue		piper-phonemize
	Project
4	Mentions	1
-	Stars	58
-	Growth	-
-	Activity	7.7
-	Latest Commit	3 months ago
	Language	C++
-	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

larynx-dialogue

Posts with mentions or reviews of larynx-dialogue. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-30.

Meaningful Nonsense: How I generate sentences
7 projects | news.ycombinator.com | 30 May 2024

This is so great--both in terms of the project & the write-up. Thanks for sharing your work! :)
A "quick" dump of some thoughts it provoked:
(1) Really like the "meaning-nonsense continuum" concept--and that neither extreme is explicitly labelled in the image immediately following where the term is introduced. :)
(And, yes, "Gravity learns about regret." is indeed kind of beautiful.)
(Aside: Adding alt text to the generated images would be beneficial both for copy/pasting & accessibility reasons. :) )
(2) This statement made me smile: "If you don’t enjoy reading and contemplating these sentences like this, we are simply very different people." Because while I very much fall into the category of "enjoy reading and contemplating these sentences" I can also imagine people who definitely don't. :D
(3) Part of the reason for (2) was because I'd already encountered a couple of the "don't work" examples where my immediate thought was "wait, how about in this situation though?". e.g. "Aligning with compass."
Which then got me thinking about stylistic, metaphor or time-period related aspects of grammar, e.g. pirate treasure map instructions written in cryptic sentence fragments.
(4) Which leads into the whole "procedurally generated diagrams/documents" aspect of the project that I also think is particularly cool. It immediately got me thinking of being in a game and walking into an inventor character's lab or a magician's library and finding all manner of technical or arcane drawings generated for either environment or game mechanic purposes (e.g. in a detective/mystery game like "Shadows of Doubt").
Then again, since I was a kid I've always had an interest in browsing technical component and, err, office stationery catalogues, so maybe it's just me. :D
It also made me think back to "Interminal" an entry to the 2020 ProcJam procedural generation game jam, which generated an airport terminal including duty-free stores with procedurally generated perfumes (including "name, visual identity and smell"): https://nothke.itch.io/interminal
(5) The "physicalization" aspect of the project to render it in a tangible form is also cool--particularly because (at least to me but maybe also to all those vinyl-owning kids with no turntable :D ) a physical form seems to the amplify the "meaningfulness" of a message...
The underlying "digitalized cursive handwriting" project was also of interest--though I only skimmed its write-up in an attempt to avoid yet another rabbit hole. :)
The handwriting system seemed like it might also be amenable to animated use--which then made me think, "Wait! I've seen something about that recently...", had no idea what, but then some background retrieval process just produced the answer as I was writing, it was Noclip's recent "The Making of Pentiment" documentary: https://www.youtube.com/embed/ffIdgOBYwbc
(6) With regard to the technical execution/implementation side of things: I've observed that a project's need for text generation often raises a question of what implementation approach to use, in terms of a DIY vs pre-existing solution.
One issue which affects text generation specifically is "general solutions" often seem to tend toward specialization over time (e.g. "text generation for interactive fiction", "text generation for branching narrative-driven games", tools such as Yarn Spinner[0] & Ink[1]), which ends up making the solutions less suitable for simpler/different use cases & increases difficulty of the learning curve.
This was something I ran into during a small sub-project[2] last year where the text generation was somewhat "incidental" to overall project goal. I started out with a "quick DIY" solution but still ended up spending enough time on that aspect that I started to wonder if I'd be better not entirely re-inventing the wheel.
Around this time I ran into the "Blur Markup Language"[3][4][5] project which has the tag line "write text that changes" and--while I haven't yet used it--seems like it might be a promising "mid-level abstraction" solution for text generation, so thought I'd mention it as a potential option for others with text generation needs.
(7) In terms of other helpful text generation related resources, I've found various "word lists" to be of use, so thought I'd mention this "weasel words" list as a starting point: https://github.com/words/weasels/blob/main/data.txt#L14
(The repo README also links to other word lists under the same org including word categories such as "buzzwords", "filler", "hedges" & words listed in order of associated positive/negative sentiment.)
Thanks again for sharing your work & look forward to seeing where your projects go in future, should you share more in future. :)
---- footnotes ----
[0] https://github.com/YarnSpinnerTool/YarnSpinner
[1] https://github.com/inkle/ink
[2] I wanted to generate "scripted dialogue" samples[2a] to demonstrate the 900+ individual Text-To-Speech speaker voices in the Piper TTS[2b] LibriTTS voice model[2c], in a form that is: useful for evaluating the voices; not incredibly tedious to listen to; and, makes it possible to identify which speaker you are currently hearing.
[2a] Subset of resulting generated[2d] speech output can be heard in the second example here: https://rancidbacon.gitlab.io/piper-tts-demos/
[2b] https://github.com/rhasspy/piper
[2c] https://huggingface.co/rhasspy/piper-voices/blob/main/en/en_...
[2d] Text generation script: https://gitlab.com/RancidBacon/larynx-dialogue/-/blob/featur...
[3] https://bml-lang.org
[4] BML intro/overview: https://bml-lang.org/docs/guide/language-basics/
[5] Online BML editor with syntax cheat sheet: https://bml-lang.org/sandbox/
ChatTTS-Best TTS Model
8 projects | news.ycombinator.com | 28 May 2024

My interest in TTS is around "indie" game creation, animation and "radio plays".
A couple of years ago I started development of a tool to help with the generation of game audio such as NPC dialogue, "barks" or narration for those without access to/budget for human voice actors: https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...
One thing I found interesting is that writing a small "scene" and then hearing dialogue being spoken by a variety of voices often prompted the writing of further lines of dialogue in response to perceived emotion contained in voices in the generated output. Plus it was just fun. :)
The version of the tool on that page is based on Larynx TTS which has continued development more recently as Piper TTS: https://github.com/rhasspy/piper
I'm yet to publish my port which uses Piper TTS though: https://gitlab.com/RancidBacon/larynx-dialogue/-/tree/featur...
Though I did upload some sample output (including some "radio announcer" samples in response to a HN comment :) ): https://rancidbacon.gitlab.io/piper-tts-demos/
Obviously there's variations in voice quality, and ability to control expression is currently limited but beats hearing my own voice. :D
ESpeak-ng: speech synthesizer with more than one hundred languages and accents
21 projects | news.ycombinator.com | 1 May 2024

Based on my own recent experience[0] with espeak-ng, IMO the project is currently in a really tough situation[3]:
* the project seems to provide real value to a huge number of people who rely on it for reasons of accessibility (even more so for non-English languages); and,
* the project is a valuable trove of knowledge about multiple languages--collected & refined over multiple decades by both linguistic specialists and everyday speakers/readers; but...
* the project's code base is very much of "a different era" reflecting its mid-90s origins (on RISC OS, no less :) ) and a somewhat piecemeal development process over the following decades--due in part to a complex Venn diagram of skills, knowledge & familiarity required to make modifications to it.
Perhaps the prime example of the last point is that `espeak-ng` has a hand-rolled XML parser--which attempts to handle both valid & invalid SSML markup--and markup parsing is interleaved with internal language-related parsing in the code. And this is implemented in C.
[Aside: Due to this I would strongly caution against feeding "untrusted" input to espeak-ng in its current state but unfortunately that's what most people who rely on espeak-ng for accessibility purposes inevitably do while browsing the web.]
[TL;DR: More detail/repros/observations on espeak-ng issues here:
* https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...
* https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...
* https://gitlab.com/RancidBacon/notes_public/-/blob/main/note...
]
Contributors to the project are not unaware of the issues with the code base (which are exacerbated by the difficulty of even tracing the execution flow in order to understand how the library operates) nor that it would benefit from a significant refactoring effort.
However as is typical with such projects which greatly benefit individual humans but don't offer an opportunity to generate significant corporate financial return, a lack of developers with sufficient skill/knowledge/time to devote to a significant refactoring means a "quick workaround" for an specific individual issue is often all that can be managed.
This is often exacerbated by outdated/unclear/missing documentation.
IMO there are two contribution approaches that could help the project moving forward while requiring the least amount of specialist knowledge/experience:
* Improve visibility into the code by adding logging/tracing to make it easier to see why a particular code path gets taken.
* Integrate an existing XML parser as a "pre-processor" to ensure that only valid/"sanitized"/cleaned-up XML is passed through to the SSML parsing code--this would increase robustness/safety and facilitate future removal of XML parsing-specific workarounds from the code base (leading to less tangled control flow) and potentially future removal/replacement of the entire bespoke XML parser.
Of course, the project is not short on ideas/suggestions for how to improve the situation but, rather, direct developer contributions so... shrug
In light of this, last year when I was developing the personal project[0] which made use of a dependency that in turn used espeak-ng I wanted to try to contribute something more tangible than just "ideas" so began to write-up & create reproductions for some of the issues I encountered while using espeak-ng and at least document the current behaviour/issues I encountered.
Unfortunately while doing so I kept encountering new issues which would lead to the start of yet another round of debugging to try to understand what was happening in the new case.
Perhaps inevitably this effort eventually stalled--due to a combination of available time, a need to attempt to prioritize income generation opportunities and the downsides of living with ADHD--before I was able to share the fruits of my research. (Unfortunately I seem to be way better at discovering & root-causing bugs than I am at writing up the results...)
However I just now used the espeak-ng project being mentioned on HN as a catalyst to at least upload some of my notes/repros to a public repo (see links in TLDR section above) in that hopes that maybe they will be useful to someone who might have the time/inclination to make a more direct code contribution to the project. (Or, you know, prompt someone to offer to fund my further efforts in this area... :) )
[0] A personal project to "port" my "Dialogue Tool for Larynx Text To Speech" project[1] to use the more recent Piper TTS[2] system which makes use of espeak-ng for transforming text to phonemes.
[1] https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to... & https://gitlab.com/RancidBacon/larynx-dialogue/-/tree/featur...
[2] https://github.com/rhasspy/piper
[3] Very much no shade toward the project intended.
Home Assistant’s Year of the Voice – Chapter 2
7 projects | news.ycombinator.com | 27 Apr 2023

My interest in offline TTS is actually entirely unrelated to the automation space: I'm interested in Text to Speech for creative pursuits, such as video game voice dialogue and animated videos.
This is one of the reasons why the range & quantity of available voices is particularly important to me.
After all, you can't really have scene set in a board room with nine characters[3] if you've only got three voices to go around. :)
I've actually been spending time this week on updating my "Dialogue Tool"[1] application (originally created to work with Larynx to help with narrative dialogue workflows such as voice "auditioning", intelligent caching & multiple voice recordings) to work with Piper.
Which is where I ran into the question of how to navigate/curate a collection of more than 900+ voices.
The main approaches I'm using so far are:
(1) Random luck--just audition a bunch of different voices with your sample dialogue & see what you like.
(2) Curation/sorting based on quality-related meta-data from the original dataset.
(3) Generating a different dialogue line for each voice that includes their speaker number for identification purposes that also (hopefully) isn't tedious to listen to for 900+ voices. :)
I haven't quite finished/uploaded results from (3) yet but example output based on approaches (3) & (2) can be heard here: https://rancidbacon.gitlab.io/piper-tts-demos/
The recording has two sets of 10 voices which had the lowest Word Error Rate scores in the original dataset--which doesn't mean the resulting voice model is necessary good but is at least a starting point for exploring.
I'd also like to explore more analysis-based approaches for grouping/curation (e.g. vocal characteristics such "softer", "lower", "older") but as I'm not getting paid for this[2], that's likely a longer term thing.
A different approach which I've previously found really interesting is to use voices as a prompt for writing narrative dialogue. It really helps to hear the dialogue as you write it and the nuances of different voices can help spur ideas for where a conversation goes next...
[1] See: https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to... & https://gitlab.com/RancidBacon/larynx-dialogue/-/tree/featur...
[2] Am currently available/open to be though. :D
[3] Will try to upload some example audio of this scene because I found it pretty funny. :)

piper-phonemize

Posts with mentions or reviews of piper-phonemize. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-01.

ESpeak-ng: speech synthesizer with more than one hundred languages and accents
21 projects | news.ycombinator.com | 1 May 2024

Yeah, it would be nice if the financial backing behind Rhasspy/Piper led to improvements in espeak-ng too but based on my own development-related experience with the espeak-ng code base (related elsewhere in the thread) I suspect it would be significantly easier to extract the specific required text to phonemes functionality or (to a certain degree) reimplement it (or use a different project as a base[3]) than to more closely/fully integrate changes with espeak-ng itself[4]. :/
It seems Piper currently abstracts its phonemize-related functionality with a library[0] that currently makes use of a espeak-ng fork[1].
Unfortunately it also seems license-related issues may have an impact[2] on whether Piper continues to make use of espeak-ng.
For your specific example of handling 1984 as a year, my understanding is that espeak-ng can handle situations like that via parameters/configuration but in my experience there can be unexpected interactions between different configuration/API options[6].
[0] https://github.com/rhasspy/piper-phonemize
[1] https://github.com/rhasspy/espeak-ng
[2] https://github.com/rhasspy/piper-phonemize/issues/30#issueco...
[3] Previously I've made note of some potential options here: https://gitlab.com/RancidBacon/notes_public/-/blob/main/note...
[4] For example, as I note here[5] there's currently at least four different ways to access espeak-ng's phoneme-related functionality--and it seems that they all differ in their output, sometimes consistently and other times dependent on configuration (e.g. audio output mode, spoken punctuation) and probably also input. :/
[5] https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...
[6] For example, see my test cases for some other numeric-related configuration options here: https://gitlab.com/RancidBacon/floss-various-contribs/-/blob...

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Compare larynx-dialogue vs piper-phonemize and see what are their differences.

larynx-dialogue

piper-phonemize

larynx-dialogue

piper-phonemize