Like Diffusion but Faster: The Paella Model for Fast Image Generation

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

Paella

6 730 4.7 Jupyter Notebook

Official Implementation of Paella https://arxiv.org/abs/2211.07292v2

Github for those looking for the code https://github.com/dome272/Paella

Wuerstchen

1 485 6.4 Jupyter Notebook

Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models

Fully correct, also the v2 of the paper introduced a model that is bigger and slower, however generates better images. So the 500ms was only for the first model we introduced in v1. I also want to mention our new work as it is very much related to this whole topic of "speeding up models" -> either training or sampling: Würstchen: https://github.com/dome272/wuerstchen/

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Pytorch

338 77,783 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

- The gain in stable diffusion is modest (15%-25% last I checked?)
- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles. Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.
- Some common augmentations (like TomeSD) break compilation, make it take forever, or kill the performance gains.
- Other miscellaneous bugs (like freezing the Python thread and causing timeouts in web UIs, or errors with embeddings)
- Dynamic input in Torch 2.1 nightly fixes a lot of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...
- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.
- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.
- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.
Note that the popular SD Web UIs don't support any of this, with two exceptions: VoltaML (with WIP AIT support) and a the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project