-
2024 is shaping up to be the year of multimodal machine learning. From real-time text-to-image models and open-world vocabulary models to multimodal large language models like GPT-4V and Gemini Pro Vision, AI is primed for an unprecedented array of interactive multimodal applications and experiences.
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
While OpenAI’s CLIP model has garnered a lot of attention, it is far from the only game in town—and far from the best! On the OpenCLIP leaderboard, for instance, the largest and most capable CLIP model from OpenAI ranks just 41st(!) in its average zero-shot accuracy across 38 datasets.
-
awesome-clip-papers
The most impactful papers related to contrastive pretraining for multimodal models!
For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.
-
For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
(Github Repo | Most Popular Model | Paper | Project Page)
-
StyleCLIP
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
While CLIP on its own is useful for applications such as zero-shot classification, semantic searches, and unsupervised data exploration, CLIP is also used as a building block in a vast array of multimodal applications, from Stable Diffusion and DALL-E to StyleCLIP and OWL-ViT. For most of these downstream applications, the initial CLIP model is regarded as a “pre-trained” starting point, and the entire model is fine-tuned for its new use case.
-
klite
[NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222
(Github Repo | Paper)
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
(Github Repo | Most Popular Model | Paper)