Our great sponsors
-
DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
big-sleep
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Not a complete answer to your question but you may find this discussion interesting:
https://github.com/lucidrains/DALLE2-pytorch/discussions/10
Inference cost and scale seems to be much more favourable than large language models (for now).
This uses OpenAI’s CLIP model which is open source: https://github.com/openai/clip
An older, but similar and still impressive alternative is available here: https://github.com/CompVis/latent-diffusion
If you have a decent amount of VRAM, you can use it to start generating images with their pre-trained models. They're nowhere near as impressive as DALL-E 2, but they're still pretty damn cool. I don't know what the exact memory requirements are, but I've gotten it to run on a 1080 TI with 11gb.
Also very interested in this. AFAIK, the best alternative to DALLE-type generation is CLIP-Guided generation (such as Disco Diffusion [1] and MidJourney[2]) which can take anywhere from 1 - 20 minutes on an RTX A5000.
[1]: https://github.com/alembics/disco-diffusion
This needs distributed training...
Years ago I made a shared tensor library[1] which should allow people to do training in a distributed fashion around the world. Even with relatively slow internet connections, training should still make good use of all the compute available because the whole lot runs asynchronously with highly compressed and approximate updates to shared weights.
The end result is that every bit of computation added has some benefits.
Obviously for a real large scale effort, anti-cheat and anti-spam mechanisms would be needed to ensure nodes aren't deliberately sending bad data to hurt the group effort.
[1]: https://github.com/Hello1024/shared-tensor
In case anyone else is put off by the link referencing an answer that then links to something else with most likely higher hardware requirements that are not stated, the end of the rabbit hole seems to be here: https://github.com/openai/dalle-2-preview/issues/6#issuecomm...
TL;DR: A single NVidia A100 is most likely sufficient; with a lot of optimization and stepwise execution a single 3090 Ti might also be within the realm of possibility.
and after a few hours got this: https://i.imgur.com/FxdfdmV.png
Not nearly as cool as the real DALL-e, but maybe I'm missing something.
[1] https://github.com/lucidrains/big-sleep