Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
> JAX
Yeah, check out their post: https://huggingface.co/blog/sdxl_jax
I dunno how expensive TPU instances are these days, but the performance is insane!
> We tried SDXL but found the quality improvement to be marginal.
Yeah, the vanilla HF diffusers pipe is unimpressive to me.
Try playing with this though, turn on FreeU and specify an anime style: https://github.com/MoonRide303/Fooocus-MRE
I have never gotten such high quality results from simple prompts, even in cloud models like Midjourney/GPT4. The question is how to port even part of that magic over to the diffusers pipeline...
Also, VoltaML has a good reference GPU AITemplate SD 1.5 implementation:
https://github.com/VoltaML/voltaML-fast-stable-diffusion/tre...
The speed jump is massive on my desktop GPU, probably even more dramatic on cloud hardware, and it may support some things (weight swapping/lora swapping/resolution changing) better than JAX.
VoltaML is a relatively vanilla diffusers-based backend, so its not a hairy monster to hack like you may have seen with SAI-based UIs.
The AITTemplate code is a lightly modified version of Facebook's example, code, to get rid of small issues like VRAM spikes: https://github.com/facebookincubator/AITemplate/tree/main/ex...
InvokeAI is also diffusers based, but they seem to mess with the pipeline a bit more.
And anyway, all that may be a better reference for interesting features rather than a backend to try and adopt.