A History of CLIP Model Training Data Advances

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

YOLO-World

3 3,321 8.9 Jupyter Notebook

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

2024 is shaping up to be the year of multimodal machine learning. From real-time text-to-image models and open-world vocabulary models to multimodal large language models like GPT-4V and Gemini Pro Vision, AI is primed for an unprecedented array of interactive multimodal applications and experiences.

open_clip

27 8,391 8.4 Jupyter Notebook

An open source implementation of CLIP.

While OpenAI’s CLIP model has garnered a lot of attention, it is far from the only game in town—and far from the best! On the OpenCLIP leaderboard, for instance, the largest and most capable CLIP model from OpenAI ranks just 41st(!) in its average zero-shot accuracy across 38 datasets.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
awesome-clip-papers

1 11 5.4 Python

The most impactful papers related to contrastive pretraining for multimodal models!

For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.

zero-shot-prediction-plugin

3 24 7.6 Python

Run zero-shot prediction models on your data

For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.

CLIP

103 22,051 1.2 Jupyter Notebook

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

(Github Repo | Most Popular Model | Paper | Project Page)

StyleCLIP

23 3,896 0.0 HTML

Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

While CLIP on its own is useful for applications such as zero-shot classification, semantic searches, and unsupervised data exploration, CLIP is also used as a building block in a vast array of multimodal applications, from Stable Diffusion and DALL-E to StyleCLIP and OWL-ViT. For most of these downstream applications, the initial CLIP model is regarded as a “pre-trained” starting point, and the entire model is fine-tuned for its new use case.

klite

1 51 0.0 Python

[NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222

(Github Repo | Paper)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
MetaCLIP

5 995 7.5 Python

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

(Github Repo | Most Popular Model | Paper)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project