Suggest an alternative to

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Why do you think that https://github.com/showlab/Awesome-Video-Diffusion is a good alternative to SpeechT5