Our great sponsors
-
MetaCLIP
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
I have been playing with MetaCLIP this afternoon and made https://github.com/autodistill/autodistill-metaclip as a pip installable version. The Facebook repository has some guidance but you have to pull the weights yourself, save them, etc.
My inference function (model.predict("image.png")) return an sv.Classifications object that you can load into supervision for processing (i.e. get top k) [1].
The paper [2] notes the following in terms of performance:
> In Table 4, we observe that MetaCLIP outperforms OpenAI CLIP on ImageNet and average accuracy across 26 tasks, for 3 model scales. With 400 million training data points on ViT-B/32, MetaCLIP outperforms CLIP by +2.1% on ImageNet and by +1.6% on average. On ViT-B/16, MetaCLIP outperforms CLIP by +2.5% on ImageNet and by +1.5% on average. On ViT-L/14, MetaCLIP outperforms CLIP by +0.7% on ImageNet and by +1.4% on average across the 26 tasks.
[1] https://github.com/autodistill/autodistill-metaclip
https://github.com/mlfoundations/open_clip/blob/main/docs/op...
I found CLIP to be _amazing_ for all kinds of image search, like search-by-text or search-by-image. I even ported it to NumPy to understand it better. The whole thing is less than 500 lines of Python: https://github.com/99991/NumPyCLIP
I suggest trying BLIP for this. I've had really good results from that.
https://github.com/salesforce/BLIP
I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption
I suggest trying BLIP for this. I've had really good results from that.
https://github.com/salesforce/BLIP
I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption