DALL-E VS dalle-2-preview

Compare DALL-E vs dalle-2-preview and see what are their differences.

DALL-E

PyTorch package for the discrete VAE used for DALL·E. (by openai)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
DALL-E dalle-2-preview
31 61
10,692 1,049
0.6% 0.0%
0.0 1.8
about 2 months ago over 1 year ago
Python
GNU General Public License v3.0 or later -
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

DALL-E

Posts with mentions or reviews of DALL-E. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-06-25.

dalle-2-preview

Posts with mentions or reviews of dalle-2-preview. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-08-16.
  • The AI Art Apocalypse
    3 projects | news.ycombinator.com | 16 Aug 2022
    DALL-E's docs for example mention it can output whole copyrighted logos and characters[1] and understands it's possible to generate human faces that are bear the likeness of those in the training data. We've also seen people recently critique Stable Diffusion's output for attempting to recreate artists' signatures that came from the commercial trained data.

    That said by a certain point the kinks will be ironed out and likely skirt around such issues by only incorporating/manipulating just enough to be considered fair use and creative transformation.

    [1] "The model can generate known entities including trademarked logos and copyrighted characters." https://github.com/openai/dalle-2-preview/blob/main/system-c...

  • Unpopular opinion: the rise of dalle mini has destroyed chances of this going fully public
    2 projects | /r/dalle2 | 14 Jun 2022
    Photorealistic human faces are not to be shared publicly per this blog post. This document from OpenAI contains some though.
    2 projects | /r/dalle2 | 14 Jun 2022
  • DALL-E Mini seems to distort faces in a similar way that swastikas are distorted - compare to the monster generation - DALL-E 2 says "We also used advanced techniques to prevent photorealistic generations of real individuals’ faces" and I think that means using distortion...
    2 projects | /r/OpenAI | 13 Jun 2022
    There are 3 different AIs involved. DALL-E (1) and DALL-E 2 are from OpenAI, while DALL-E Mini is not. DALL-E 2's architecture is much different than that of DALL-E (1). DALL-E Mini is architecturally much more like DALL-E (1) than DALL-E 2. DALL-E Mini uses VQGAN, but DALL-E (1) does not. This document from OpenAI shows many DALL-E 2 photorealistic faces.
    2 projects | /r/OpenAI | 13 Jun 2022
    This document about DALL-E 2 is from OpenAI. Do you see the faces there? It is linked to at the official DALL-E 2 page. I have also seen DALL-E 2-generated images with photorealistic faces from other users, but OpenAI doesn't want those posted publicly per for example this blog post.
  • A challenger approaches...
    2 projects | /r/dalle2 | 10 Jun 2022
    OpenAI's system card has a section on bias and representation. A couple of examples:
  • DALL-E 2 open source implementation
    10 projects | news.ycombinator.com | 1 May 2022
    In case anyone else is put off by the link referencing an answer that then links to something else with most likely higher hardware requirements that are not stated, the end of the rabbit hole seems to be here: https://github.com/openai/dalle-2-preview/issues/6#issuecomm...

    TL;DR: A single NVidia A100 is most likely sufficient; with a lot of optimization and stepwise execution a single 3090 Ti might also be within the realm of possibility.

  • A music-video generated by AI #Dalle2
    4 projects | /r/woahdude | 1 May 2022
    OpenAI is still working on intellectual property issues per this document.
  • Dall-E 2 illustrations of Twitter BIOS
    2 projects | news.ycombinator.com | 9 Apr 2022
    It's making art that Silicon Valley people like because it's being given absurdly stereotypically "Bay Area Twitter Loving AI Person" drawing prompts. DALL-E can make other styles of art or just photos quite easily, look at the samples for simpler and more normal prompts here:

    https://github.com/openai/dalle-2-preview/blob/main/system-c...

    The art style is a direct consequence of the fact that apparently not one of the people this guy follows on Twitter is a normal person - they're all psychedelic-obsessed AI researchers whose Twitter bios are chosen to be abstract and weird as possible. So the AI does what it's told and creates abstract weird art as it tries to interpret stuff like "commitments empathetic, psychedelic, philosophical" or "cottagecore tech-adjacent young robert moses". I think it did an amazing job, honestly.

    The real social issue we should be debating here is whether the sort of people who work at OpenAI can be trusted to make honest, normal AI to begin with. I remember seeing a comment on HN some years ago to the effect of "AI safety is what happens when hard left social activists discover that there's no way to train AI on the writings of normal people without it thinking like a normal person".

    The document I linked above is mostly about horrors like the model creating photos of a white male builder when prompted with "photo of a builder". It's full of weird, stunted quasi-English like: the prompt “lawyer” results disproportionately in images of people who are White-passing and male-passing in Western dress, while the prompt “nurse” tends to result in images of people who are female-passing. What does that even mean? Presumably this is the latest iteration of trans related language games that the rest of us didn't get the memo on?

    Like always with OpenAI, they train an AI and then freak out out when it describes the world as it actually is. The real AI safety question is not DALL-E in its current state, it's whether the final AI that they release to the public will be "safe" in the sense of actually understanding reality, or whether it exists in some bizarre, non-existent SJW dystopia in which builders always black women and white men don't exist at all.

  • Horse-riding astronaut is a milestone in AI’s journey to make sense of the world
    2 projects | news.ycombinator.com | 8 Apr 2022
    As someone who works in CG and has a strong interest in AI for almost 40 years, I have to say the examples you gave give me the opposite impression. I think they're fantastic, other than what the developers highlight - that they are racially-biased due to the input sources. But from an image=generation perspective, I'm blown away.

    A couple of these lawyer images have issues (one is holding a book hilariously called "LAWER"):

    https://github.com/openai/dalle-2-preview/raw/main/assets/Mo...

What are some alternatives?

When comparing DALL-E and dalle-2-preview you can also consider the following projects:

dalle-mini - DALL·E Mini - Generate images from a text prompt

DALLE-pytorch - Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALLE2-pytorch - Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

latent-diffusion - High-Resolution Image Synthesis with Latent Diffusion Models

disco-diffusion

big-sleep - A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

pixray

glide-text2im - GLIDE: a diffusion-based text-conditional image synthesis model

CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

gpt-3 - GPT-3: Language Models are Few-Shot Learners