Our great sponsors
-
DALLE2-pytorch
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
The paper describing the model is public and has been implemented here, but that's not the hard part. The model likely requires months of compute and dozens of gigabytes of VRAM to train and run, likely costing several hundred thousand dollars.
Google's Imagen has also been implemented, and is actually simpler and seems to perform better, especially for text, but the story is similar. As with GPT-3, not many people are willing to burn hundreds of thousands to train a model only to release it to the public for free. And even if they do, good luck running it. BigScience/HuggingFace just released a GPT-3 equivalent. It requires 640GB of VRAM to run at full speed. From what I gather that would cost in the realm of $40 an hour to run on the cloud. These image models are apparently smaller but will still cost a lot.
Related posts
- Google's StyleDrop can transfer style from a single image
- One year ago I got access to closed beta DALL-E 2.
- Besides Gaming - for what can be a 4080 useful?
- Is creating a StableDiffusion-inspired model feasible for my Master's thesis?
- TEDx talk on how to prepare for a career in vfx with the rapid changes caused by AI / machine learning