Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Github for those looking for the code https://github.com/dome272/Paella
Fully correct, also the v2 of the paper introduced a model that is bigger and slower, however generates better images. So the 500ms was only for the first model we introduced in v1. I also want to mention our new work as it is very much related to this whole topic of "speeding up models" -> either training or sampling: Würstchen: https://github.com/dome272/wuerstchen/
- The gain in stable diffusion is modest (15%-25% last I checked?)
- Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles. Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning.
- Some common augmentations (like TomeSD) break compilation, make it take forever, or kill the performance gains.
- Other miscellaneous bugs (like freezing the Python thread and causing timeouts in web UIs, or errors with embeddings)
- Dynamic input in Torch 2.1 nightly fixes a lot of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen...
- TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM.
- AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile.
- TVM supports very performant Vulkan inference, which would massively expand hardware compatibility.
Note that the popular SD Web UIs don't support any of this, with two exceptions: VoltaML (with WIP AIT support) and a the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think).