Our great sponsors
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
So there's CLIP (Contrastive Language-Image Pretraining), which I thought this was referring to. And then there's CLIP Guided Stable Diffusion, which "can help to generate more realistic images by guiding stable diffusion at every denoising step with an additional CLIP model", which is just using that same CLIP model.
Then there's also BLIP (Bootstrapping Language-Image Pre-training).
Related posts
- Is there a website where you can upload a photo and get the description in a paragraph?
- GPT-4 shows emergent Theory of Mind on par with an adult. It scored in the 85+ percentile for a lot of major college exams. It can also do taxes and create functional websites from a simple drawing
- meme
- Object Recognition for Photo Metadata
- Stable-diffusion in Nix