autodistill-metaclip
NumPyCLIP
autodistill-metaclip | NumPyCLIP | |
---|---|---|
1 | 1 | |
16 | 4 | |
- | - | |
6.4 | 5.2 | |
5 months ago | 11 months ago | |
Python | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
autodistill-metaclip
-
MetaCLIP – Meta AI Research
I have been playing with MetaCLIP this afternoon and made https://github.com/autodistill/autodistill-metaclip as a pip installable version. The Facebook repository has some guidance but you have to pull the weights yourself, save them, etc.
My inference function (model.predict("image.png")) return an sv.Classifications object that you can load into supervision for processing (i.e. get top k) [1].
The paper [2] notes the following in terms of performance:
> In Table 4, we observe that MetaCLIP outperforms OpenAI CLIP on ImageNet and average accuracy across 26 tasks, for 3 model scales. With 400 million training data points on ViT-B/32, MetaCLIP outperforms CLIP by +2.1% on ImageNet and by +1.6% on average. On ViT-B/16, MetaCLIP outperforms CLIP by +2.5% on ImageNet and by +1.5% on average. On ViT-L/14, MetaCLIP outperforms CLIP by +0.7% on ImageNet and by +1.4% on average across the 26 tasks.
[1] https://github.com/autodistill/autodistill-metaclip
NumPyCLIP
-
MetaCLIP – Meta AI Research
I found CLIP to be _amazing_ for all kinds of image search, like search-by-text or search-by-image. I even ported it to NumPy to understand it better. The whole thing is less than 500 lines of Python: https://github.com/99991/NumPyCLIP
What are some alternatives?
clip-interrogator - Image to prompt with BLIP and CLIP
blip-caption - Generate captions for images with Salesforce BLIP
open_clip - An open source implementation of CLIP.
BLIP - PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
MetaCLIP - ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
sam-clip - Use Grounding DINO, Segment Anything, and CLIP to label objects in images.
Text2LIVE - Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
aphantasia - CLIP + FFT/DWT/RGB = text to image/video
Chinese-CLIP - Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.