Suggest an alternative to

vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Why do you think that https://github.com/audeering/w2v2-how-to is a good alternative to vits