SpeechT5 Alternatives

Similar projects and alternatives to SpeechT5 based on common topics and language

PaddleSpeech

6 10,161 6.8 Python SpeechT5 VS PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Awesome-Video-Diffusion

1 2,426 8.9 SpeechT5 VS Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
espnet

15 7,872 10.0 Python SpeechT5 VS espnet

End-to-End Speech Processing Toolkit
CLAP

1 1,149 6.0 Python SpeechT5 VS CLAP

Contrastive Language-Audio Pretraining (by LAION-AI)
NeMo

29 10,084 9.8 Python SpeechT5 VS NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
AudioMAE

1 470 0.6 Python SpeechT5 VS AudioMAE

This repo hosts the code and models of "Masked Autoencoders that Listen".

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better SpeechT5 alternative or higher similarity.

Suggest an alternative to SpeechT5

SpeechT5 reviews and mentions

Posts with mentions or reviews of SpeechT5. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-19.

[HELP] Speech2Speech translator with speaker voice preservation
1 project | /r/learnmachinelearning | 20 May 2023

Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
Nvidia Text2Video
2 projects | /r/StableDiffusion | 19 Apr 2023
Foundation models for speech analysis/synthesis/modification
3 projects | /r/speechtech | 11 Apr 2023
[2210.03730] SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
1 project | /r/speechtech | 26 Oct 2022

The idea to separate text from speech is important. Models released today: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
A note from our sponsor - InfluxDB
www.influxdata.com | 30 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic SpeechT5 repo stats

Mentions

Stars

1,018

Activity

7.9

Last Commit

6 days ago

microsoft/SpeechT5 is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of SpeechT5 is Python.

Popular Comparisons

SpeechT5

SpeechT5 Alternatives

Similar projects and alternatives to SpeechT5 based on common topics and language

PaddleSpeech

Awesome-Video-Diffusion

WorkOS

espnet

CLAP

NeMo

AudioMAE

SpeechT5 reviews and mentions

Stats

Popular Comparisons