Language Models Can See: Plugging Visual Controls in Text Generation
Why do you think that https://github.com/j-min/CLIP-Caption-Reward is a good alternative to MAGIC
Language Models Can See: Plugging Visual Controls in Text Generation
Why do you think that https://github.com/j-min/CLIP-Caption-Reward is a good alternative to MAGIC