DALL-E 2 open source implementation

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • DALLE2-pytorch

    Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

  • Not a complete answer to your question but you may find this discussion interesting:

    https://github.com/lucidrains/DALLE2-pytorch/discussions/10

    Inference cost and scale seems to be much more favourable than large language models (for now).

  • CLIP

    CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

  • This uses OpenAI’s CLIP model which is open source: https://github.com/openai/clip

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • latent-diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

  • An older, but similar and still impressive alternative is available here: https://github.com/CompVis/latent-diffusion

    If you have a decent amount of VRAM, you can use it to start generating images with their pre-trained models. They're nowhere near as impressive as DALL-E 2, but they're still pretty damn cool. I don't know what the exact memory requirements are, but I've gotten it to run on a 1080 TI with 11gb.

  • disco-diffusion

  • Also very interested in this. AFAIK, the best alternative to DALLE-type generation is CLIP-Guided generation (such as Disco Diffusion [1] and MidJourney[2]) which can take anywhere from 1 - 20 minutes on an RTX A5000.

    [1]: https://github.com/alembics/disco-diffusion

  • shared-tensor

    A distributed, shared tensor with high performance approximate updates for machine learning

  • This needs distributed training...

    Years ago I made a shared tensor library[1] which should allow people to do training in a distributed fashion around the world. Even with relatively slow internet connections, training should still make good use of all the compute available because the whole lot runs asynchronously with highly compressed and approximate updates to shared weights.

    The end result is that every bit of computation added has some benefits.

    Obviously for a real large scale effort, anti-cheat and anti-spam mechanisms would be needed to ensure nodes aren't deliberately sending bad data to hurt the group effort.

    [1]: https://github.com/Hello1024/shared-tensor

  • dalle-2-preview

  • In case anyone else is put off by the link referencing an answer that then links to something else with most likely higher hardware requirements that are not stated, the end of the rabbit hole seems to be here: https://github.com/openai/dalle-2-preview/issues/6#issuecomm...

    TL;DR: A single NVidia A100 is most likely sufficient; with a lot of optimization and stepwise execution a single 3090 Ti might also be within the realm of possibility.

  • big-sleep

    A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

  • and after a few hours got this: https://i.imgur.com/FxdfdmV.png

    Not nearly as cool as the real DALL-e, but maybe I'm missing something.

    [1] https://github.com/lucidrains/big-sleep

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts