Voice100 includes neural TTS/ASR models. Inference of Voice100 is low cost as its models are tiny and only depend on CNN without autoregression.
Why do you think that https://github.com/CorentinJ/Real-Time-Voice-Cloning is a good alternative to voice100