x-stable-diffusion
TensorRT
x-stable-diffusion | TensorRT | |
---|---|---|
5 | 5 | |
546 | 2,343 | |
-0.5% | 1.9% | |
4.5 | 9.5 | |
5 months ago | 6 days ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
x-stable-diffusion
-
[D] Is there an affordable way to host a diffusers Stable Diffusion model publicly on the Internet for "real-time"-inference? (CPU or Serverless GPU?)
Cheapest would be to deploy it on your own using: https://github.com/stochasticai/x-stable-diffusion. Let me if you need more help on real-time inference.
-
[D]deploy stable diffusion
However, I suggest you "accelerate" your inference first. For example, you can use open-source inference engines (see: https://github.com/stochasticai/x-stable-diffusion) to easily accelerate your inference 2x or more. That means you can generates 2x more images / $ on public clouds.
-
30% Faster than xformers? voltaML vs xformers stable diffusion - NVIDIA 4090
Brilliant, the x-stable-diffusion TensorRT/ AITemplate etc. sample image suggested they weren't consistent between the optimizations at all, unless they hadn't locked the seed which would have been foolish for the test.
-
Upto 2.5X speed up of Stable-diffusion/Dreambooth using one line of code with voltaML.
I was looking at this three days ago, the problem is there seems to be a huge difference in what is being generated looking at the example spread on https://github.com/stochasticai/x-stable-diffusion , whereas copying model, params, seed should be giving a near identical image.
- Using Tensor Cores for Deep Learning.
TensorRT
- Learn TensorRT optimization
- I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
- [P] [D] I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
-
[P] 4.5 times faster Hugging Face transformer inference by modifying some Python AST
Have you tried the new Torch-TensorRT compiler from NVIDIA?
-
PyTorch 1.10
You can quantize your model to FP16 or Int8 using PTQ as well and it should give you an additional speed up inference wise.
Here is a tutorial[2] to leverage TRTorch.
[1] https://github.com/NVIDIA/TRTorch/tree/master/core
What are some alternatives?
voltaML - ⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.
torch2trt - An easy to use PyTorch to TensorRT converter
AITemplate - AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
sd_dreambooth_extension
cutlass - CUDA Templates for Linear Algebra Subroutines
infery-examples - A collection of demo-apps and inference scripts for various deep learning frameworks using infery (Python).
onnx-simplifier - Simplify your onnx model
jukebox - Code for the paper "Jukebox: A Generative Model for Music"
TensorRT - NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
sdui - Local ImGui UI for Stable Diffusion. Features embedded PNG metadata, Apple M1 fixes, result caching, img2img, and more!
transformer-deploy - Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀