Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Why do you think that https://github.com/showlab/Awesome-Video-Diffusion is a good alternative to SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Why do you think that https://github.com/showlab/Awesome-Video-Diffusion is a good alternative to SpeechT5