Like Diffusion but Faster: The Paella Model for Fast Image Generation

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Paella

    Official Implementation of Paella https://arxiv.org/abs/2211.07292v2

  • Github for those looking for the code https://github.com/dome272/Paella

  • Wuerstchen

    Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models

  • Fully correct, also the v2 of the paper introduced a model that is bigger and slower, however generates better images. So the 500ms was only for the first model we introduced in v1. I also want to mention our new work as it is very much related to this whole topic of "speeding up models" -> either training or sampling: Würstchen: https://github.com/dome272/wuerstchen/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • - The gain in stable diffusion is modest (15%-25% last I checked?)

    - Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles. Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.

    - Some common augmentations (like TomeSD) break compilation, make it take forever, or kill the performance gains.

    - Other miscellaneous bugs (like freezing the Python thread and causing timeouts in web UIs, or errors with embeddings)

    - Dynamic input in Torch 2.1 nightly fixes a lot of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...

    - TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.

    - AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.

    - TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.

    Note that the popular SD Web UIs don't support any of this, with two exceptions: VoltaML (with WIP AIT support) and a the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts