[D] What is the best open source text to speech model?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • tortoise-tts

    A multi-voice TTS system trained with an emphasis on quality

  • Tortoise TTS is supposed to be good. However inference can take a while if not on GPU's, so might not produce the real-time text-to-speech effect you want.

  • speechbrain

    A PyTorch-based Speech Toolkit

  • I don't know if it's the best, but Speechbrain is supposed to be state of the art.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • tacotron

    A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

  • Tacotron submitted: Mar 29, 2017 paper: https://arxiv.org/pdf/1703.10135.pdf github: https://github.com/keithito/tacotron (Not the official implementation but is the once cited the most)

  • tacotron2

    Tacotron 2 - PyTorch implementation with faster-than-realtime inference

  • flowtron

    Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

  • FastSpeech2

    An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

  • FastSpeech2 submitted: Jun 8, 2020 paper: https://arxiv.org/pdf/2006.04558.pdf github: https://github.com/ming024/FastSpeech2 (Not the official implementation but is the once cited the most)

  • NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • waveglow

    A Flow-based Generative Network for Speech Synthesis

  • hifi-gan

    HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

  • radtts

    Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.

  • RadTTS submitted: Aug 18, 2021 (NVIDIA page, not Arxiv) paper: https://openreview.net/pdf?id=0NQwnnwAORi github: https://github.com/NVIDIA/radtts

  • Speech-Backbones

    This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

  • vits

    VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

  • STYLER

    Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech, INTERSPEECH 2021 (by keonlee9420)

  • DiffSinger

    PyTorch implementation of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (focused on DiffSpeech) (by keonlee9420)

  • DiffTTS (DiffSinger) submitted: Apr 3, 2021 paper: https://arxiv.org/pdf/2104.01409v1.pdf github: https://github.com/keonlee9420/DiffSinger

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts