MiniGPT-4 vs stable-diffusion-webui-wd14-tagger

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/) (by Vision-CAIR)

Suggest topics

Source Code

minigpt-4.github.io

Suggest alternative

Edit details

stable-diffusion-webui-wd14-tagger

Labeling extension for Automatic1111's Web UI (by toriato)

stable-diffusion stable-diffusion-webui-plugin deepdanbooru

DISCONTINUED

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

MiniGPT-4		stable-diffusion-webui-wd14-tagger
	Project
37	Mentions	15
24,899	Stars	888
0.8%	Growth	-
9.1	Activity	8.6
12 days ago	Latest Commit	10 months ago
Python	Language	Python
BSD 3-clause "New" or "Revised" License	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

MiniGPT-4

Posts with mentions or reviews of MiniGPT-4. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-19.

"Building Machines That Learn and Think Like People", 7 Years Later
1 project | news.ycombinator.com | 15 Oct 2023

I just think the tech has been out for so long it's not as big of a deal. Mini-Gpt4 has been out for 6 months! Of course the descriptions aren't exactly gpt-4 grade, but with mistral 7b being used as the language model instead of llama 7b, the reasoning ability will improve noticeably.
[1] https://github.com/Vision-CAIR/MiniGPT-4
Minigpt4 Inference on CPU
2 projects | news.ycombinator.com | 19 Jul 2023
Multimodal LLM for infographics images
1 project | /r/LocalLLaMA | 10 Jul 2023

Isn't there only two open multimodal LLMs, LLaVA and mini-gpt4?
Ai trained on photos
3 projects | /r/LocalLLaMA | 12 Jun 2023

For LLM visual instruction, you can use LLaVA, LaVIN, or MiniGPT-4.
CLIP and DeepDanbooru Alternatives For Prompt Generation [Relevant Self-Promotion]
7 projects | /r/StableDiffusion | 4 Jun 2023
Looking for a pre trained food recognition model
4 projects | /r/LocalLLaMA | 30 May 2023

Please read the rules before posting. If you want a model for visual instruction, use LLaVA, LaVIN, or MiniGPT-4.
Minigpt-4 (Vicuna 13B + images)
1 project | /r/LocalLLaMA | 29 May 2023
Upload a photo of your meal and get roasted by ChatGPT
1 project | /r/ChatGPT | 25 May 2023

So we use MiniGPT-4 for image parsing, and yep it does return a pretty detailed (albeit not always accurate) description of the photo. You can actually play around with it on Huggingface here.

1 project | /r/OpenAI | 24 May 2023

We use MiniGPT-4 first to interpret the image and then pass the results onto GPT-4. Hopefully, once GPT-4 makes its multi-modal functionality available, we can do it all in one request.
Give some love to multi modal models trained on censored llama based models
1 project | /r/LocalLLaMA | 15 May 2023

But I would like to bring up that there are some multi models(llava, miniGPT-4) that are built based on censored llama based models like vicuna. I tried several multi modal models like llava, minigpt4 and blip2. Llava has very good captioning and question answering abilities and it is also much faster than the others(basically real time), though it has some hallucination issue.

stable-diffusion-webui-wd14-tagger

Posts with mentions or reviews of stable-diffusion-webui-wd14-tagger. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-04.

CLIP and DeepDanbooru Alternatives For Prompt Generation [Relevant Self-Promotion]
7 projects | /r/StableDiffusion | 4 Jun 2023
Ideas for extensions?
4 projects | /r/Oobabooga | 24 May 2023

Create an extension like 'send pictures' that uses the WD14 tagger which is way more detailed and has options for nsfw etc. Its used in Automatic1111 and Koyha ss so there's extensions you can probably implement from. https://github.com/toriato/stable-diffusion-webui-wd14-tagger
vladmandic-WD14-Tagger
2 projects | /r/StableDiffusion | 23 May 2023

If anyone is interested I made some changes to toriato's wd14-tagger, now it works also on vladmandic webui, repo here. You can do a new installation, or use your old automatic1111 one changing 3 files, instructions on my repo. The lora files also work (there were some problems in the vlad issue page). I'm not a programmer and it's not perfect though, in fact for now if you don't like the default tagger model you have to change it manually (instructions in the repo), and since it is basically a fork of toriato's version, if there were errors there, there will be here too.
Community-trained SD 1.6 Model, can we do it?
4 projects | /r/StableDiffusion | 9 May 2023

Automatic captioning tools that can be used as an initial point for captions: this tool or this one.
Is anyone able to make the tagger extension compatible with Vlad UI ?
2 projects | /r/StableDiffusion | 1 May 2023
What are your favorite Extensions?
12 projects | /r/StableDiffusion | 30 Apr 2023

wd14-tagger, to describe anime images and get a prompt idea
Experiment AI Anime w/ C-Net 1.1 + GroundingDINO + SAM + MFR (workflow)
4 projects | /r/StableDiffusion | 24 Apr 2023

Use WD 1.4 tagger (https://github.com/toriato/stable-diffusion-webui-wd14-tagger) to extract prompt words from each frame (threshold 0.65), then use the dataset tag editor (https://github.com/toshiaki1729/stable-diffusion-webui-dataset-tag-editor) for batch editing, mainly:
Currently getting better results with Kohya ss Loras (Lycoris Locon) than with DB, am I alone?
2 projects | /r/DreamBooth | 14 Apr 2023

I recommend using EveryDream2. You'll need an 11GB VRAM GPU. There's no need to crop or resize images, just caption them, which can be done automatically with CLIP Interrogator or WD14 taggers. Make sure to add the trigger word for your subject. It's not a Dreambooth script; it's actual training, so it shouldn't be as destructive to the model as Dreambooth. Typically, using an LR of 1e-6 with a cosine scheduler over two epochs and a batch size of 4 works fine. This script supports validation, so you can actually watch in real-time whether the training is going well or if you're overfitting. I got very good results using it.
For Lora training, isn’t there a good AI that discribes the pictures you want to use for training?
2 projects | /r/TrainDiffusion | 11 Apr 2023

In my current process, I use CLIP Interrogator to produce a high level caption and wd14 tagger for more granular booru tags. Typically in that order, because you can append the results from the latter to the former. Both tools perform with greater accuracy than the standard interrogators in img2img and give you more flexibility and features as well. You still have to do some manual adjustments, but I generally prefer this process over starting from scratch.
Captioning LoRA's
1 project | /r/StableDiffusion | 21 Mar 2023

What are some alternatives?

When comparing MiniGPT-4 and stable-diffusion-webui-wd14-tagger you can also consider the following projects:

LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

clip-interrogator - Image to prompt with BLIP and CLIP

FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

batch-face-swap - Automaticaly detects faces and replaces them

AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

sd_dreambooth_extension

BooruDatasetTagManager

stable-diffusion-webui - Stable Diffusion web UI

bark - 🔊 Text-Prompted Generative Audio Model

automatic - SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models

mini-agi - MiniAGI is a simple general-purpose autonomous agent based on the OpenAI API.

stable-diffusion-webui-dataset-tag-editor - Extension to edit dataset captions for SD web UI by AUTOMATIC1111

MiniGPT-4 vs LLaVA stable-diffusion-webui-wd14-tagger vs clip-interrogator MiniGPT-4 vs FastChat stable-diffusion-webui-wd14-tagger vs batch-face-swap MiniGPT-4 vs AutoGPT stable-diffusion-webui-wd14-tagger vs sd_dreambooth_extension MiniGPT-4 vs BooruDatasetTagManager stable-diffusion-webui-wd14-tagger vs stable-diffusion-webui MiniGPT-4 vs bark stable-diffusion-webui-wd14-tagger vs automatic MiniGPT-4 vs mini-agi stable-diffusion-webui-wd14-tagger vs stable-diffusion-webui-dataset-tag-editor

Compare MiniGPT-4 vs stable-diffusion-webui-wd14-tagger and see what are their differences.

MiniGPT-4

stable-diffusion-webui-wd14-tagger

MiniGPT-4

stable-diffusion-webui-wd14-tagger

What are some alternatives?