WSDMCup2023
BLIP
WSDMCup2023 | BLIP | |
---|---|---|
1 | 14 | |
29 | 4,302 | |
- | 3.3% | |
3.0 | 0.0 | |
14 days ago | 7 months ago | |
Jupyter Notebook | Jupyter Notebook | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
WSDMCup2023
-
WSDM Cup 2023
Starter pack on GitHub
BLIP
-
MetaCLIP – Meta AI Research
I suggest trying BLIP for this. I've had really good results from that.
https://github.com/salesforce/BLIP
I built a tiny Python CLI wrapper for it to make it easier to try: https://github.com/simonw/blip-caption
-
Is there a website where you can upload a photo and get the description in a paragraph?
You can download the source and run it yourself from here: https://github.com/salesforce/BLIP
-
Stable Diffusion v2-1-unCLIP model released
Then there's also BLIP (Bootstrapping Language-Image Pre-training).
-
GPT-4 shows emergent Theory of Mind on par with an adult. It scored in the 85+ percentile for a lot of major college exams. It can also do taxes and create functional websites from a simple drawing
Or BLIP
-
meme
GitHub - salesforce/BLIP: PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
Object Recognition for Photo Metadata
From what I understand, what's most important to you is having a model that's already trained on something, rather than the architecture. Yolo is probably fine, as would be some of the older ones. You should be able to find a model that's been pretrained on COCO - you can look at see what classes are included. I don't know if there are other broadly trained models available that will serve your purpose. What I'd do is just run your picture through a COCO trained object detection model and see if the annotations do what you want.
Though backing up a bit, there are also image captioning models that may better do what you want to do for organizing your photos. I'm not really familiar with any - though I did come across BLIP the other day but I haven't used it: https://github.com/salesforce/BLIP
This may be a better way to get at what you want
-
I have a problem with the "interrogate" function of Automatic1111's fork. Can someone help me?
git clone https://github.com/salesforce/BLIP.git repositories/BLIP
-
Stable-diffusion in Nix
# Copy models as described in README cp ~/Downloads/model.ckpt . cp ~/Downloads/GFPGANv1.3.pth . # Clone other repos as mentioned in README mkdir repositories git clone https://github.com/CompVis/stable-diffusion.git repositories/stable-diffusion git clone https://github.com/CompVis/taming-transformers.git repositories/taming-transformers git clone https://github.com/sczhou/CodeFormer.git repositories/CodeFormer git clone https://github.com/salesforce/BLIP.git repositories/BLIP export NIXPKGS_ALLOW_UNFREE=1 nix-shell default.nix pip install torch --extra-index-url https://download.pytorch.org/whl/cu113 # Also from linux instructions. Can probably be added to default.nix python webui.py
-
My easy-to-install Windows GUI for Stable Diffusion is ready for a beta release! It supports img2img as well, various samplers, can run multiple scales per image automatically, and more!
Also check img2text (basically to prompt): https://github.com/salesforce/BLIP
- [D] Author Interview - BLIP: Bootstrapping Language-Image Pre-training (Video)
What are some alternatives?
Pathfinder2 - Paths, trajectories, splines, the number 2, and a whole lot of swag.
CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
website - The repo for the website development competition of the @strapi-community.
a-PyTorch-Tutorial-to-Image-Captioning - Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
rihal-challenges - This repository is used to house Rihal's challenges for hiring.
CodeFormer - [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
ansible-challenge - A series of challenges for the Steampunk Ansible Challenge competition
virtex - [CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
nix-stable-diffusion - Nix-friendly fork of: Optimized Stable Diffusion modified to run on lower GPU VRAM
taming-transformers - Taming Transformers for High-Resolution Image Synthesis
rtic-gcn-pytorch - Official PyTorch Implementation of RITC
ghci-ng