Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Why do you think that https://github.com/robobeebop/VQGAN-CLIP-Video is a good alternative to feed_forward_vqgan_clip