open_clip
DALLE-pytorch
Our great sponsors
open_clip | DALLE-pytorch | |
---|---|---|
27 | 20 | |
8,090 | 5,468 | |
7.0% | - | |
8.4 | 2.5 | |
14 days ago | about 1 month ago | |
Jupyter Notebook | Python | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
open_clip
-
A History of CLIP Model Training Data Advances
While OpenAI’s CLIP model has garnered a lot of attention, it is far from the only game in town—and far from the best! On the OpenCLIP leaderboard, for instance, the largest and most capable CLIP model from OpenAI ranks just 41st(!) in its average zero-shot accuracy across 38 datasets.
-
How to Build a Semantic Search Engine for Emojis
Whenever I’m working on semantic search applications that connect images and text, I start with a family of models known as contrastive language image pre-training (CLIP). These models are trained on image-text pairs to generate similar vector representations or embeddings for images and their captions, and dissimilar vectors when images are paired with other text strings. There are multiple CLIP-style models, including OpenCLIP and MetaCLIP, but for simplicity we’ll focus on the original CLIP model from OpenAI. No model is perfect, and at a fundamental level there is no right way to compare images and text, but CLIP certainly provides a good starting point.
- MetaCLIP – Meta AI Research
-
COMFYUI SDXL WORKFLOW INBOUND! Q&A NOW OPEN! (WIP EARLY ACCESS WORKFLOW INCLUDED!)
in the modal card it says: pretrained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L).
-
What's up in the Python community? – April 2023
https://replicate.com/pharmapsychotic/clip-interrogator
using:
cfg.apply_low_vram_defaults()
interrogate_fast()
I tried lighter models like vit32/laion400 and others etc all are very very slow to load or use (model list: https://github.com/mlfoundations/open_clip)
I'm desperately looking for something more modest and light.
-
Alternate LLM's
They have a great track record on similar scale projects. They've partnered with /r/datahoarders and volunteers on creation of training sets including their 5.8 billion image/text-pair dataset that they used to train a better version of CLIP.
- Does anyone have recommendations for GPT3 like performance for open-source models? It seems flan-t5 and its variants are the way to go - any other ones?
- 🐍 5 Awesome Python Projects People Don’t Know About
-
Some notes on porting SD2 over to iPhone (or other platforms)
The text encoder uses a new vocabulary set, make sure you copied them from open_clip repo: https://github.com/mlfoundations/open_clip (I have these also available at: https://github.com/liuliu/swift-diffusion/tree/liu/unet/examples/open_clip
-
Stable Diffusion 2.0 Release
> Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working.
There is working training code for openCLIP https://github.com/mlfoundations/open_clip
But training multi-modal text-to-image models is still a _very_ new thing, in terms of the software world. Given that, my experience has been that it's never been easier to get to work on this stuff from the software POV. The hardware is the tricky bit (and preventing bandwidth issues on distributed systems).
DALLE-pytorch
-
The Eleuther AI Mafia
It all started originally on lucidrains/dalle-pytorch in the months following the release of DALL-E (1). The group started as `dalle-pytorch-replicate` but was never officially "blessed" by Phil Wang who seems to enjoy being a free agent (can't blame him).
https://github.com/lucidrains/DALLE-pytorch/issues/116 is where the discord got kicked off originally. There's a lot of other interactions between us in the github there. You should be able to find when Phil was approached by Jenia Jitsev, Jan Ebert, and Mehdi Cherti (all starting LAION members) who graciously offered the chance to replicate the DALL-E paper using their available compute at the JUWELS and JUWELS Booster HPC system. This all predates Emad's arrival. I believe he showed up around the time guided diffusion and GLIDE, but it may have been a bit earlier.
Data work originally focused on amassing several of the bigger datasets of the time. Getting CC12M downloaded and trained on was something of an early milestone (robvanvolt's work). A lot of early work was like that though, shuffling through CC12M, COCO, etc. with the dalle-pytorch codebase until we got an avocado armchair.
Christophe Schumann was an early contributor as well and great at organizing and rallying. He focused a lot on the early data scraping work for what would become the "LAION5B" dataset. I don't want to credit him with the coding and I'm ashamed to admit I can't recall who did much of the work there - but a distributed scraping program was developed (the name was something@home... not scraping@home?).
The discord link on Phil Wang's readme at dalle-pytorch got a lot of traffic and a lot of people who wanted to pitch in with the scraping effort.
Eventually a lot of people from Eleuther and many other teams mingled with us, some sort of non-profit org was created in Germany I believe for legal purposes. The dataset continued to grow and the group moved from training DALLE's to finetuning diffusion models.
The `CompVis` team were great inspiration at the time and much of their work on VQGAN and then latent diffusion models basically kept us motivated. As I mentioned a personal motivation was Katherine Crowson's work on a variety of things like CLIP-guided vqgan, diffusion, etc.
I believe Emad Mostaque showed up around the time GLIDE was coming out? I want to say he donated money for scrapers to be run on AWS to speed up data collection. I was largely hands off for much of the data scraping process and mostly enjoyed training new models on data we had.
As with any online community things got pretty ill-defined, roles changed over, volunteers came/went, etc. I would hardly call this definitive and that's at least partially the reason it's hard to trace as an outsider. That much of the early history is scattered about GitHub issues and PR's can't have helped though.
- New text-to-image network from Google beats DALL-E
-
[Project] DALL-3 - generate better images with fewer tokens through clip guided diffusion
If in general DDPM > GAN > VAE, why do transformer image generators all use VQVAE to decode images? Wouldn't it be better to use a diffusion model? I was wondering about this and started experimenting with different ways to decode vector-quantized embeddings with a diffusion model - see discussion here After a lot of trial and error I got something that works pretty well.
-
Ask HN: Computer Vision Project Ideas?
- "Discrete VAE", used as the backbone for OpenAI's DALL-E, reimplimented here (and other places) https://github.com/lucidrains/DALLE-pytorch (code for training a discrete VAE)
-
Crawling@Home: Help Build The Worlds Largest Image-Text Pair Dataset!
Here's the DALLE-pytorch git repo.
Since then, several efforts have been organized to replicate DALL-E. People organized initially around this awesome dalle replication repository https://github.com/lucidrains/DALLE-pytorch with some nice results that can be seen in the readme. More recently as part of an huggingface events, new results have been achieved (see https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA ) and an online demo is now available https://huggingface.co/spaces/flax-community/dalle-mini
-
Wann habt Ihr euch das letzte Mal wie ein Kind über eine Sache gefreut?
Vielleicht bei https://github.com/lucidrains/DALLE-pytorch und https://github.com/kobiso/DALLE-reproduction
-
9 Command-Line Tools to Go to Infinity & Beyond
Currently there are several projects trying to replicate DALL-E, here’s another one.
-
Text to Image Generation
I have a working repo for that
https://github.com/lucidrains/dalle-pytorch
It just needs to be trained
-
Are we ever going to get access to DALL-E?
and this
What are some alternatives?
DALL-E - PyTorch package for the discrete VAE used for DALL·E.
CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
DALLE2-pytorch - Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
DALLE-datasets - This is a summary of easily available datasets for generalized DALLE-pytorch training.
deep-daze - Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
CoCa-pytorch - Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
imagen-pytorch - Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch
DALLE-reproduction - Reproducing OpenAI's DALLE model
big-sleep - A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
TimeSformer-pytorch - Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
taming-transformers - Taming Transformers for High-Resolution Image Synthesis
vit-pytorch - Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch