Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
SpeechT5 Alternatives
Similar projects and alternatives to SpeechT5 based on common topics and language
-
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
-
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
SpeechT5 reviews and mentions
-
[HELP] Speech2Speech translator with speaker voice preservation
Hey! I’m doing a somewhat similar project but for TTS / voice cloning. This might not be too relevant for you but it might be one way to solve your problem. We based our project onSpeecht5 which is a multimodal setup that can take in audio or text and output audio or text. It uses speaker embeddings to handle multiple speakers, so you could use Metas S2ST to translate audio and this model to preserve the voice by doing audio to audio speech conversion. Here’s a hugging tutorial which mentions speech conversion with speecht5 https://huggingface.co/blog/speecht5
- Nvidia Text2Video
- Foundation models for speech analysis/synthesis/modification
-
[2210.03730] SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
The idea to separate text from speech is important. Models released today: https://github.com/microsoft/SpeechT5/tree/main/SpeechUT SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
-
A note from our sponsor - InfluxDB
www.influxdata.com | 30 Apr 2024
Stats
microsoft/SpeechT5 is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of SpeechT5 is Python.
Sponsored