autodistill-metaclip
sam-clip
autodistill-metaclip | sam-clip | |
---|---|---|
1 | 1 | |
16 | 20 | |
- | - | |
6.4 | 5.4 | |
5 months ago | 4 months ago | |
Python | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
autodistill-metaclip
-
MetaCLIP – Meta AI Research
I have been playing with MetaCLIP this afternoon and made https://github.com/autodistill/autodistill-metaclip as a pip installable version. The Facebook repository has some guidance but you have to pull the weights yourself, save them, etc.
My inference function (model.predict("image.png")) return an sv.Classifications object that you can load into supervision for processing (i.e. get top k) [1].
The paper [2] notes the following in terms of performance:
> In Table 4, we observe that MetaCLIP outperforms OpenAI CLIP on ImageNet and average accuracy across 26 tasks, for 3 model scales. With 400 million training data points on ViT-B/32, MetaCLIP outperforms CLIP by +2.1% on ImageNet and by +1.6% on average. On ViT-B/16, MetaCLIP outperforms CLIP by +2.5% on ImageNet and by +1.5% on average. On ViT-L/14, MetaCLIP outperforms CLIP by +0.7% on ImageNet and by +1.4% on average across the 26 tasks.
[1] https://github.com/autodistill/autodistill-metaclip
sam-clip
What are some alternatives?
clip-interrogator - Image to prompt with BLIP and CLIP
autodistill - Images to inference with no labeling (use foundation models to train supervised models).
open_clip - An open source implementation of CLIP.
Track-Anything - Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
BLIP - PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
anylabeling - Effortless AI-assisted data labeling with AI support from YOLO, Segment Anything, MobileSAM!!
NumPyCLIP - Pure NumPy implementation of https://github.com/openai/CLIP
Instruct2Act - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Text2LIVE - Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
SegmentAnythingin3D - Segment Anything in 3D with NeRFs (NeurIPS 2023)
aphantasia - CLIP + FFT/DWT/RGB = text to image/video
Chinese-CLIP - Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.