MiniGPT-4
sd-webui-lobe-theme
MiniGPT-4 | sd-webui-lobe-theme | |
---|---|---|
37 | 77 | |
24,899 | 2,163 | |
0.8% | 5.0% | |
9.1 | 9.3 | |
13 days ago | 22 days ago | |
Python | TypeScript | |
BSD 3-clause "New" or "Revised" License | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
MiniGPT-4
-
"Building Machines That Learn and Think Like People", 7 Years Later
I just think the tech has been out for so long it's not as big of a deal. Mini-Gpt4 has been out for 6 months! Of course the descriptions aren't exactly gpt-4 grade, but with mistral 7b being used as the language model instead of llama 7b, the reasoning ability will improve noticeably.
[1] https://github.com/Vision-CAIR/MiniGPT-4
- Minigpt4 Inference on CPU
-
Multimodal LLM for infographics images
Isn't there only two open multimodal LLMs, LLaVA and mini-gpt4?
-
Ai trained on photos
For LLM visual instruction, you can use LLaVA, LaVIN, or MiniGPT-4.
- CLIP and DeepDanbooru Alternatives For Prompt Generation [Relevant Self-Promotion]
-
Looking for a pre trained food recognition model
Please read the rules before posting. If you want a model for visual instruction, use LLaVA, LaVIN, or MiniGPT-4.
- Minigpt-4 (Vicuna 13B + images)
-
Upload a photo of your meal and get roasted by ChatGPT
So we use MiniGPT-4 for image parsing, and yep it does return a pretty detailed (albeit not always accurate) description of the photo. You can actually play around with it on Huggingface here.
We use MiniGPT-4 first to interpret the image and then pass the results onto GPT-4. Hopefully, once GPT-4 makes its multi-modal functionality available, we can do it all in one request.
-
Give some love to multi modal models trained on censored llama based models
But I would like to bring up that there are some multi models(llava, miniGPT-4) that are built based on censored llama based models like vicuna. I tried several multi modal models like llava, minigpt4 and blip2. Llava has very good captioning and question answering abilities and it is also much faster than the others(basically real time), though it has some hallucination issue.
sd-webui-lobe-theme
-
Upscayl – Free and Open Source AI Image Upscaler
upscayl is very approachable, but lacked many features i needed. i ended up using https://github.com/AUTOMATIC1111/stable-diffusion-webui after upscaling became part of my regular workflow, but for someone who just needs a few images enhanced, it's an ideal tool.
-
The Basics of AI Image Generation: How to create your own AI-generated image using Stable Diffusion on your local machine.
For the Git alternative, simply right-click on the location you want to put the Stable Diffusion and select “Git Bash Here”, then paste this on the CLI: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
-
Stable Cascade
ComfyUI is similar to Houdini in complexity, but immensely powerful. It's a joy to use.
There are also a large amount of resources available for it on YouTube, GitHub (https://github.com/comfyanonymous/ComfyUI_examples), reddit (https://old.reddit.com/r/comfyui), CivitAI, Comfy Workflows (https://comfyworkflows.com/), and OpenArt Flow (https://openart.ai/workflows/).
I still use AUTO1111 (https://github.com/AUTOMATIC1111/stable-diffusion-webui) and the recently released and heavily modified fork of AUTO1111 called Forge (https://github.com/lllyasviel/stable-diffusion-webui-forge).
-
Show HN: I made a local wrapper for Automatic 1111
Seems like an interesting project. Regarding the name, is there permission to use something so similar to AUTOMATIC1111 [1]?
> Diffusers will Cuda out of memory/perform very slowly for huge generations, like 2048x2048 images, while Auto 1111 SDK won't.
Do we have some numbers on this? I have seen AUTOMATIC1111 fall-over whilst using only half the available of GPU VRAM - there seems to be some weirdness where it tries to allocate before de-allocating the last batch or something.
> You can use any of the 6 compatible RealEsrgran models/weights with our RealEsrgran pipeline for upscaling images. Here are the model ids:
I've previously had trouble trying to use AUTOMATIC1111 upscalers, it seems like it needs more GPU VRAM than just generating the image already upscaled.
[1] https://github.com/AUTOMATIC1111/stable-diffusion-webui
-
Stable Code 3B: Coding on the Edge
You might be thinking of Fooocus: https://github.com/lllyasviel/Fooocus
The Stable Diffusion web interface that got a lot of people's attention originally was Automatic1111: https://github.com/AUTOMATIC1111/stable-diffusion-webui
Fooocus is definitely more beginner friendly. It does a lot of the prompt engineering for you. Automatic1111 has a ton of plugins, most notably ControlNet which gives you fine grained control over the images, but there is a learning curve.
- Google Imagen 2
-
Free or "practically-free" Ai picture generator?
Stable Diffusion https://github.com/AUTOMATIC1111/stable-diffusion-webui
-
Things to do, to put my old PC to use?
Make it into a stable diffusion server!
-
GTA 6 trailer screencaps, photorealistic style
There's no link version, you have to run it locally. You install it from here
-
Automatic1111 v1.7.0-RC published
Repository: AUTOMATIC1111/stable-diffusion-webui · Tag: v1.7.0-RC · Commit: 48fae7c · Released by: AUTOMATIC1111
What are some alternatives?
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
stable-diffusion-webui - Stable Diffusion web UI
FastChat - An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
ComfyUI - The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
AutoGPT - AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
automatic - SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
stable-diffusion-webui-wd14-tagger - Labeling extension for Automatic1111's Web UI
stable-diffusion-webui-directml - Stable Diffusion web UI
BooruDatasetTagManager
stable-diffusion-webui-ux - Stable Diffusion web UI UX
bark - 🔊 Text-Prompted Generative Audio Model
stable-diffusion-webui-colab - stable diffusion webui colab