feed_forward_vqgan_clip
VQGAN-CLIP-Video
feed_forward_vqgan_clip | VQGAN-CLIP-Video | |
---|---|---|
4 | 1 | |
136 | 22 | |
- | - | |
3.7 | 1.8 | |
4 months ago | about 2 years ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
feed_forward_vqgan_clip
-
[D] Hosting AI Art Generative ML Model
WOMBO I suspect uses the feed forward inferential approach to VQGAN + CLIP (instead of finetuning, predict the final z latent vector for a given text input) which is why their outputs are less sophisticated: as a result there are many deployment optimizations you can do to speed that up, which may be complicated.
-
A small experiment on how changes in a text prompt may affect output image in a CLIP-based system
The system used to produce these images is unlike most other VQGAN+CLIP systems because it uses a neural network trained by the developer(s) instead of an iterative process. This system is known to have a "formula" for image layout.
-
Get a VQGAN output image for a given text description almost instantly (not including time for one-time setup) using Colab notebook "Feed Forward VQGAN CLIP - Using a pretrained model" from mehdidc. Here are 20 non-cherry picked images from the notebook. Details in a comment.
Hello, some news. For those who are interested, I released new models (release 0.2) that you could try and you might find them better (depending on the prompt) than the current one(s), also the problem that was mentioned by /u/Wiskkey is less visible (object parts appearing systematically on top-left), but still not 100% solved, there is still a common global structure that can be identified, but it's more centered on the image. The Colab notebook was updated to use the new models.
VQGAN-CLIP-Video
What are some alternatives?
VQGAN-CLIP - Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
frame-interpolation - FILM: Frame Interpolation for Large Motion, In ECCV 2022.
big-sleep - A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
optical.flow.demo - A project that uses optical flow and machine learning to detect aimhacking in video clips.
deep-daze - Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
vqgan-clip-app - Local image generation using VQGAN-CLIP or CLIP guided diffusion
Text-to-Image-Synthesis - Pytorch implementation of Generative Adversarial Text-to-Image Synthesis paper
moviepy - Video editing with Python
DALLE-pytorch - Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
AI-Art - PyTorch (and PyTorch Lightning) implementation of Neural Style Transfer, Pix2Pix, CycleGAN, and Deep Dream!
CLIP-Guided-Diffusion - Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.