A History of CLIP Model Training Data Advances

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • YOLO-World

    [CVPR 2024] Real-Time Open-Vocabulary Object Detection

  • 2024 is shaping up to be the year of multimodal machine learning. From real-time text-to-image models and open-world vocabulary models to multimodal large language models like GPT-4V and Gemini Pro Vision, AI is primed for an unprecedented array of interactive multimodal applications and experiences.

  • open_clip

    An open source implementation of CLIP.

  • While OpenAI’s CLIP model has garnered a lot of attention, it is far from the only game in town—and far from the best! On the OpenCLIP leaderboard, for instance, the largest and most capable CLIP model from OpenAI ranks just 41st(!) in its average zero-shot accuracy across 38 datasets.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • awesome-clip-papers

    The most impactful papers related to contrastive pretraining for multimodal models!

  • For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.

  • zero-shot-prediction-plugin

    Run zero-shot prediction models on your data

  • For a comprehensive catalog of papers pushing the state of CLIP models forward, check out this Awesome CLIP Papers Github repository. Additionally, the Zero-shot Prediction Plugin for FiftyOne allows you to apply any of the OpenCLIP-compatible models to your own data.

  • CLIP

    CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

  • (Github Repo | Most Popular Model | Paper | Project Page)

  • StyleCLIP

    Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

  • While CLIP on its own is useful for applications such as zero-shot classification, semantic searches, and unsupervised data exploration, CLIP is also used as a building block in a vast array of multimodal applications, from Stable Diffusion and DALL-E to StyleCLIP and OWL-ViT. For most of these downstream applications, the initial CLIP model is regarded as a “pre-trained” starting point, and the entire model is fine-tuned for its new use case.

  • klite

    [NeurIPS 2022] code for "K-LITE: Learning Transferable Visual Models with External Knowledge" https://arxiv.org/abs/2204.09222

  • (Github Repo | Paper)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

  • (Github Repo | Most Popular Model | Paper)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts