The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 text-to-speech Open-Source Projects
-
MockingBird
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
TTS
:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts) (by mozilla)
-
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
-
pyvideotrans
Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言,并添加配音
-
silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
-
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
-
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
-
TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
-
edge-tts
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
-
Awesome-Prompt-Engineering
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc
-
espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
-
aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
-
marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: OpenAI deems its voice cloning tool too risky for general release | news.ycombinator.com | 2024-03-31lol this marketing technique is getting very old. https://github.com/coqui-ai/TTS is already amazing and open source.
Project mention: Ask HN: Voice ID adoption at financial institutions | news.ycombinator.com | 2024-04-03Given the inevitability of easy voice cloning[1], it seems irresponsible to be using voice as a positive authentication signal.
Unfortunately, major US financial institutions seem to be ramping up adoption of this technology[2].
Am I missing something?
[1] https://github.com/myshell-ai/OpenVoice
It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.
Coqui-ai was a commercial continuation of Mozilla TTS and STT (https://github.com/mozilla/TTS).
At the time (2018-ish), it was really impressive for on-device voice synthesis (with a quality approaching the Google and Azure cloud-based voice synthesis options) and open source, so a lot of people in the FOSS community were hoping it could be used for a privacy-respecting home assistant, Linux speech synthesis that doesn't suck, etc.
After Mozilla abandoned the project, Coqui continued development and had some really impressive one-shot voice cloning, but pivoted to marketing speech synthesis for game developers. They were probably having trouble monetizing it, and it doesn't surprise me that they shut down.
An equivalent project that's still in active development and doing really well is Piper TTS (https://github.com/rhasspy/piper).
And the voice encapsulation system VITS https://github.com/jaywalnut310/vits
Project mention: Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning | news.ycombinator.com | 2023-10-02I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "Silero" [0, 1].
Following the "standalone" guide [2], it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (pretty natural, not annoying to listen to).
IIRC the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be fairly good backup option for some use cases IMO.
[0] https://github.com/snakers4/silero-models#text-to-speech
Project mention: WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper | news.ycombinator.com | 2024-01-17If you're not already aware, the primary developer of Mimic 3 (and its non-Mimic predecessor Larynx) continued TTS-related development with Larynx and the renamed project Piper: https://github.com/rhasspy/piper
Last year Piper development was supported by Nabu Casa for their "Year of Voice" project for Home Assistant and it sounds like Mike Hansen is going to continue on it with their support this year.
Hey HN, has anyone found a viable solution for doing this locally and offline on iOS? I'd like to offer a privacy-friendly text to speech feature to my App, and Apple's speech synthesis sounds awful compared to some newer models and TTS engines. The only thing I've found is an older TensorflowTTS example here: https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/ios
Any pointers or tips appreciated.
Project mention: [discussion] text to voice generation for textbooks (non-math part) | /r/MachineLearning | 2023-12-01i would very much like to use it to turn the text parts of a book into an audio where i could listen to it while reading. i used edge's tts for speech by giving a paragraph to clipboard and to edge-tts in order to listen the text but it causes two problems: 1. you need internet connection and have the book opened 2. can only do paragraph by paragraph, and is prone to errors or sometimes if you use it too much it wont convert the full text afterwards.
Yes, there are a lot of different resources online, especially for generative AI. The Awesome Prompt Engineering github is probably a good place to start https://github.com/promptslab/Awesome-Prompt-Engineering. If you're focusing directly on OpenAI's models then the OpenAI Prompt Engineering Guide would be my recommendation https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api.
I'm skeptical about a senior JavaScript developer claiming to be bored. Nonetheless, let's see. How would you go about modifying [this](ng/blob/master/emscripten/espeakng_glue.idl) IDL file, this C++ glue code, and the relevant Make file to compile eSpeak NG to JavaScript with Emscripten with SSML support enabled?
Project mention: I've open sourced my Flutter plugin to run on-device LLMs on any platform. TestFlight builds available now. | /r/FlutterDev | 2023-12-08And more stuff I’m often checking back on: - https://github.com/staghado/vit.cpp - https://github.com/serp-ai/bark-with-voice-clone - https://github.com/leejet/stable-diffusion.cpp (generate images) - etc … there’s too much fun stuff out there. Wish I had more free time haha.
I did find one but I am not sure if the link is still reliable https://marytts.github.io/.
For our real-time TTS needs, we'll employ the fantastic library called gTTS.
text-to-speech related posts
- Show HN: I ported Suno AI's Bark model in C for fast realistic audio generation
- Bark.cpp: Port of Suno AI's Bark in C/C++ for fast inference
- babel_fish - real time language translation
- Ask HN: Voice ID adoption at financial institutions
- OpenVoice: Versatile Instant Voice Cloning
- OpenAI deems its voice cloning tool too risky for general release
- OpenAI: Navigating the Challenges and Opportunities of Synthetic Voices
-
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024
Index
What are some of the best open-source text-to-speech projects? This list will help you:
Project | Stars | |
---|---|---|
1 | MockingBird | 33,796 |
2 | TTS | 29,174 |
3 | OpenVoice | 17,263 |
4 | Leon | 14,539 |
5 | TTS | 8,784 |
6 | VALL-E-X | 7,138 |
7 | EmotiVoice | 6,270 |
8 | vits | 6,230 |
9 | pyvideotrans | 5,556 |
10 | silero-models | 4,534 |
11 | DiffSinger | 4,102 |
12 | piper | 3,902 |
13 | Amphion | 3,898 |
14 | TensorFlowTTS | 3,697 |
15 | edge-tts | 3,503 |
16 | Awesome-Prompt-Engineering | 3,196 |
17 | vall-e | 2,868 |
18 | espeak-ng | 2,858 |
19 | bark-with-voice-clone | 2,798 |
20 | aeneas | 2,379 |
21 | Tacotron-2 | 2,231 |
22 | marytts | 2,208 |
23 | gTTS | 2,139 |
Sponsored