[D] What is the best open source text to speech model?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. tortoise-tts

    A multi-voice TTS system trained with an emphasis on quality

    Tortoise TTS is supposed to be good. However inference can take a while if not on GPU's, so might not produce the real-time text-to-speech effect you want.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. speechbrain

    A PyTorch-based Speech Toolkit

    I don't know if it's the best, but Speechbrain is supposed to be state of the art.

  4. tacotron

    A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

    Tacotron submitted: Mar 29, 2017 paper: https://arxiv.org/pdf/1703.10135.pdf github: https://github.com/keithito/tacotron (Not the official implementation but is the once cited the most)

  5. tacotron2

    Tacotron 2 - PyTorch implementation with faster-than-realtime inference

  6. flowtron

    Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

  7. FastSpeech2

    An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

    FastSpeech2 submitted: Jun 8, 2020 paper: https://arxiv.org/pdf/2006.04558.pdf github: https://github.com/ming024/FastSpeech2 (Not the official implementation but is the once cited the most)

  8. NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. waveglow

    A Flow-based Generative Network for Speech Synthesis

  11. hifi-gan

    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

  12. radtts

    Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.

    RadTTS submitted: Aug 18, 2021 (NVIDIA page, not Arxiv) paper: https://openreview.net/pdf?id=0NQwnnwAORi github: https://github.com/NVIDIA/radtts

  13. Speech-Backbones

    This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

  14. vits

    VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

  15. STYLER

    Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021 (by keonlee9420)

  16. DiffSinger

    PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech) (by keonlee9420)

    DiffTTS (DiffSinger) submitted: Apr 3, 2021 paper: https://arxiv.org/pdf/2104.01409v1.pdf github: https://github.com/keonlee9420/DiffSinger

  17. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • WhisperSpeech – An Open Source text-to-speech system built by inverting Whisper

    9 projects | news.ycombinator.com | 17 Jan 2024
  • [D] TTS systems to download & run offline

    3 projects | /r/MachineLearning | 14 May 2023
  • githubで公開されている音声自動生成AI、日本のアニメキャラ2890名分の音声を学習素材に超速度で進化中

    4 projects | /r/r_kenmou | 2 Nov 2022
  • 日本語英語中国語を読み上げできる音声自動生成AIがgithubで公開され話題に

    2 projects | /r/r_kenmou | 10 Oct 2022
  • Ask HN: What is the state of OSS voice cloning?

    6 projects | news.ycombinator.com | 30 Sep 2024

Did you know that Python is
the 2nd most popular programming language
based on number of references?