DALL-E
dalle-2-preview
Our great sponsors
DALL-E | dalle-2-preview | |
---|---|---|
31 | 61 | |
10,692 | 1,049 | |
0.6% | 0.0% | |
0.0 | 1.8 | |
about 2 months ago | over 1 year ago | |
Python | ||
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DALL-E
- lofi nuclear war to relax and study to
-
[N] [D] Openai, who runs DALLE-2 alleged threatened creator of DALLE-Mini
Code for https://arxiv.org/abs/2102.12092 found: https://github.com/openai/DALL-E
- A music-video generated by AI #Dalle2
-
I am developing an online game in which you generate artworks using AI and run an art gallery. You can auction/display your art to other players and visit their galleries. These are examples of artworks based on the prompt "hellgate". Which do you like best? All used slightly different settings.
It does
-
this is not overfitting but something else, right?
Can somebody tell me a likely source of error here? Is the models capacity not large enough to sufficiently capture the complexity of the data? I dont think so, since I am using a sota convolutional VAE used by OpenAI for a dataset comprising millions of images.
-
[N] OpenAI has released the encoder and decoder for the discrete VAE used for DALL-E
Issue: Any plan on releasing the text encoder? https://github.com/openai/DALL-E/issues/4
- DALL-E Paper and Code
dalle-2-preview
-
The AI Art Apocalypse
DALL-E's docs for example mention it can output whole copyrighted logos and characters[1] and understands it's possible to generate human faces that are bear the likeness of those in the training data. We've also seen people recently critique Stable Diffusion's output for attempting to recreate artists' signatures that came from the commercial trained data.
That said by a certain point the kinks will be ironed out and likely skirt around such issues by only incorporating/manipulating just enough to be considered fair use and creative transformation.
[1] "The model can generate known entities including trademarked logos and copyrighted characters." https://github.com/openai/dalle-2-preview/blob/main/system-c...
-
Unpopular opinion: the rise of dalle mini has destroyed chances of this going fully public
Photorealistic human faces are not to be shared publicly per this blog post. This document from OpenAI contains some though.
-
DALL-E Mini seems to distort faces in a similar way that swastikas are distorted - compare to the monster generation - DALL-E 2 says "We also used advanced techniques to prevent photorealistic generations of real individuals’ faces" and I think that means using distortion...
There are 3 different AIs involved. DALL-E (1) and DALL-E 2 are from OpenAI, while DALL-E Mini is not. DALL-E 2's architecture is much different than that of DALL-E (1). DALL-E Mini is architecturally much more like DALL-E (1) than DALL-E 2. DALL-E Mini uses VQGAN, but DALL-E (1) does not. This document from OpenAI shows many DALL-E 2 photorealistic faces.
This document about DALL-E 2 is from OpenAI. Do you see the faces there? It is linked to at the official DALL-E 2 page. I have also seen DALL-E 2-generated images with photorealistic faces from other users, but OpenAI doesn't want those posted publicly per for example this blog post.
-
A challenger approaches...
OpenAI's system card has a section on bias and representation. A couple of examples:
-
DALL-E 2 open source implementation
In case anyone else is put off by the link referencing an answer that then links to something else with most likely higher hardware requirements that are not stated, the end of the rabbit hole seems to be here: https://github.com/openai/dalle-2-preview/issues/6#issuecomm...
TL;DR: A single NVidia A100 is most likely sufficient; with a lot of optimization and stepwise execution a single 3090 Ti might also be within the realm of possibility.
-
A music-video generated by AI #Dalle2
OpenAI is still working on intellectual property issues per this document.
-
Dall-E 2 illustrations of Twitter BIOS
It's making art that Silicon Valley people like because it's being given absurdly stereotypically "Bay Area Twitter Loving AI Person" drawing prompts. DALL-E can make other styles of art or just photos quite easily, look at the samples for simpler and more normal prompts here:
https://github.com/openai/dalle-2-preview/blob/main/system-c...
The art style is a direct consequence of the fact that apparently not one of the people this guy follows on Twitter is a normal person - they're all psychedelic-obsessed AI researchers whose Twitter bios are chosen to be abstract and weird as possible. So the AI does what it's told and creates abstract weird art as it tries to interpret stuff like "commitments empathetic, psychedelic, philosophical" or "cottagecore tech-adjacent young robert moses". I think it did an amazing job, honestly.
The real social issue we should be debating here is whether the sort of people who work at OpenAI can be trusted to make honest, normal AI to begin with. I remember seeing a comment on HN some years ago to the effect of "AI safety is what happens when hard left social activists discover that there's no way to train AI on the writings of normal people without it thinking like a normal person".
The document I linked above is mostly about horrors like the model creating photos of a white male builder when prompted with "photo of a builder". It's full of weird, stunted quasi-English like: the prompt “lawyer” results disproportionately in images of people who are White-passing and male-passing in Western dress, while the prompt “nurse” tends to result in images of people who are female-passing. What does that even mean? Presumably this is the latest iteration of trans related language games that the rest of us didn't get the memo on?
Like always with OpenAI, they train an AI and then freak out out when it describes the world as it actually is. The real AI safety question is not DALL-E in its current state, it's whether the final AI that they release to the public will be "safe" in the sense of actually understanding reality, or whether it exists in some bizarre, non-existent SJW dystopia in which builders always black women and white men don't exist at all.
-
Horse-riding astronaut is a milestone in AI’s journey to make sense of the world
As someone who works in CG and has a strong interest in AI for almost 40 years, I have to say the examples you gave give me the opposite impression. I think they're fantastic, other than what the developers highlight - that they are racially-biased due to the input sources. But from an image=generation perspective, I'm blown away.
A couple of these lawyer images have issues (one is holding a book hilariously called "LAWER"):
https://github.com/openai/dalle-2-preview/raw/main/assets/Mo...
What are some alternatives?
dalle-mini - DALL·E Mini - Generate images from a text prompt
DALLE-pytorch - Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
DALLE2-pytorch - Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
latent-diffusion - High-Resolution Image Synthesis with Latent Diffusion Models
disco-diffusion
big-sleep - A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
pixray
glide-text2im - GLIDE: a diffusion-based text-conditional image synthesis model
CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
gpt-3 - GPT-3: Language Models are Few-Shot Learners