MetaCLIP – Meta AI Research

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

  • autodistill-metaclip

    MetaCLIP module for use with Autodistill.

  • I have been playing with MetaCLIP this afternoon and made https://github.com/autodistill/autodistill-metaclip as a pip installable version. The Facebook repository has some guidance but you have to pull the weights yourself, save them, etc.

    My inference function (model.predict("image.png")) return an sv.Classifications object that you can load into supervision for processing (i.e. get top k) [1].

    The paper [2] notes the following in terms of performance:

    > In Table 4, we observe that MetaCLIP outperforms OpenAI CLIP on ImageNet and average accuracy across 26 tasks, for 3 model scales. With 400 million training data points on ViT-B/32, MetaCLIP outperforms CLIP by +2.1% on ImageNet and by +1.6% on average. On ViT-B/16, MetaCLIP outperforms CLIP by +2.5% on ImageNet and by +1.5% on average. On ViT-L/14, MetaCLIP outperforms CLIP by +0.7% on ImageNet and by +1.4% on average across the 26 tasks.

    [1] https://github.com/autodistill/autodistill-metaclip

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • open_clip

    An open source implementation of CLIP.

  • https://github.com/mlfoundations/open_clip/blob/main/docs/op...

  • NumPyCLIP

    Pure NumPy implementation of https://github.com/openai/CLIP

  • I found CLIP to be _amazing_ for all kinds of image search, like search-by-text or search-by-image. I even ported it to NumPy to understand it better. The whole thing is less than 500 lines of Python: https://github.com/99991/NumPyCLIP

  • BLIP

    PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

  • I suggest trying BLIP for this. I've had really good results from that.

    https://github.com/salesforce/BLIP

    I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption

  • blip-caption

    Generate captions for images with Salesforce BLIP

  • I suggest trying BLIP for this. I've had really good results from that.

    https://github.com/salesforce/BLIP

    I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts