open_clip
taming-transformers
Our great sponsors
open_clip | taming-transformers | |
---|---|---|
27 | 35 | |
8,090 | 5,261 | |
7.0% | 3.7% | |
8.4 | 0.0 | |
14 days ago | 5 months ago | |
Jupyter Notebook | Jupyter Notebook | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
open_clip
-
A History of CLIP Model Training Data Advances
While OpenAI’s CLIP model has garnered a lot of attention, it is far from the only game in town—and far from the best! On the OpenCLIP leaderboard, for instance, the largest and most capable CLIP model from OpenAI ranks just 41st(!) in its average zero-shot accuracy across 38 datasets.
-
How to Build a Semantic Search Engine for Emojis
Whenever I’m working on semantic search applications that connect images and text, I start with a family of models known as contrastive language image pre-training (CLIP). These models are trained on image-text pairs to generate similar vector representations or embeddings for images and their captions, and dissimilar vectors when images are paired with other text strings. There are multiple CLIP-style models, including OpenCLIP and MetaCLIP, but for simplicity we’ll focus on the original CLIP model from OpenAI. No model is perfect, and at a fundamental level there is no right way to compare images and text, but CLIP certainly provides a good starting point.
- MetaCLIP – Meta AI Research
-
COMFYUI SDXL WORKFLOW INBOUND! Q&A NOW OPEN! (WIP EARLY ACCESS WORKFLOW INCLUDED!)
in the modal card it says: pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
-
What's up in the Python community? – April 2023
https://replicate.com/pharmapsychotic/clip-interrogator
using:
cfg.apply_low_vram_defaults()
interrogate_fast()
I tried lighter models like vit32/laion400 and others etc all are very very slow to load or use (model list: https://github.com/mlfoundations/open_clip)
I'm desperately looking for something more modest and light.
-
Alternate LLM's
They have a great track record on similar scale projects. They've partnered with /r/datahoarders and volunteers on creation of training sets including their 5.8 billion image/text-pair dataset that they used to train a better version of CLIP.
- Does anyone have recommendations for GPT3 like performance for open-source models? It seems flan-t5 and its variants are the way to go - any other ones?
- 🐍 5 Awesome Python Projects People Don’t Know About
-
Some notes on porting SD2 over to iPhone (or other platforms)
The text encoder uses a new vocabulary set, make sure you copied them from open_clip repo: https://github.com/mlfoundations/open_clip (I have these also available at: https://github.com/liuliu/swift-diffusion/tree/liu/unet/examples/open_clip
-
Stable Diffusion 2.0 Release
> Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working.
There is working training code for openCLIP https://github.com/mlfoundations/open_clip
But training multi-modal text-to-image models is still a _very_ new thing, in terms of the software world. Given that, my experience has been that it's never been easier to get to work on this stuff from the software POV. The hardware is the tricky bit (and preventing bandwidth issues on distributed systems).
taming-transformers
-
Automatic1111 for Intel Arc (A380 Tested)
taming-transformers
-
Why is ChatGPT and other large language models not feasible to be used locally in consumer grade hardware while Stable Diffusion is?
See https://arxiv.org/abs/2012.09841 for prior work. SD authors swap out the Transformer and language modelling objective with a UNet diffusion objective. In general, the more inductive bias your model has, the more efficient it can be. ChatGPT runs purely on a Transformer architecture, which has far fewer priors than a CNN and requires far more parameters as a result. This may not be the case in the future.
-
AI Is Coming For Commercial Art Jobs. Can It Be Stopped? (Greg Rutkowski quoted)
I say this to everyone... Even if SD and the model is legit and legal. Do not go around commercialising it's outputs or claiming ownership over them... and if you do the properly cite the source of the model and system along with it. In https://github.com/CompVis/stable-diffusion, https://github.com/CompVis/taming-transformers and https://huggingface.co/CompVis/stable-diffusion-v1-4 there are citiations provided for you to use for a reason. I recommend you to use them.
-
Stable-diffusion in Nix
# Copy models as described in README cp ~/Downloads/model.ckpt . cp ~/Downloads/GFPGANv1.3.pth . # Clone other repos as mentioned in README mkdir repositories git clone https://github.com/CompVis/stable-diffusion.git repositories/stable-diffusion git clone https://github.com/CompVis/taming-transformers.git repositories/taming-transformers git clone https://github.com/sczhou/CodeFormer.git repositories/CodeFormer git clone https://github.com/salesforce/BLIP.git repositories/BLIP export NIXPKGS_ALLOW_UNFREE=1 nix-shell default.nix pip install torch --extra-index-url https://download.pytorch.org/whl/cu113 # Also from linux instructions. Can probably be added to default.nix python webui.py
-
How do I run Stable Diffusion and sharing FAQs
Pip subprocess error: ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/CompVis/taming-transformers.git 'C:\stable-diffusion\stable-diffusion-main\src\taming-transformers' ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?
-
I made a share pic for the img2img results that have surfaced so far
bash pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip pip install -e .
-
Have image generation project idea, need tool pointers though
You can have a look at style transfer or the recent advances in GANs (VQGAN for example). The approach you would want to try is called fine-tuning, meaning retraining an already pretraines model on your data. In terms of programming framework I guess pytorch would be the best pick. Style transfer: https://pytorch.org/tutorials/advanced/neural_style_tutorial.html VQGAN: https://github.com/CompVis/taming-transformers
- Ask HN: Computer Vision Project Ideas?
-
RED CLIFFS OF EUPHORIA - Another new drop in the HOT new Zoom Morph Collection - 1 of 1 - on SALE - Check it out!
MIT is a commercial license. The reference implementations of VQGAN and CLIP are both MIT licensed.
- I fed a bunch of unique item names to an AI and these are the videos it generated.
What are some alternatives?
CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
stable-diffusion - This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, and multiple features and other enhancements. [Moved to: https://github.com/invoke-ai/InvokeAI]
VQGAN-CLIP - Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
stable-diffusion - Optimized Stable Diffusion modified to run on lower GPU VRAM
stable-diffusion-webui - Stable Diffusion web UI [Moved to: https://github.com/sd-webui/stable-diffusion-webui]
stable-diffusion-webui - Stable Diffusion web UI [Moved to: https://github.com/Sygil-Dev/sygil-webui]
stable-diffusion - A latent text-to-image diffusion model
DALLE-pytorch - Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
stable-diffusion-webui - Stable Diffusion web UI
CodeFormer - [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer