AITemplate
DeepSpeed-MII
Our great sponsors
AITemplate | DeepSpeed-MII | |
---|---|---|
37 | 6 | |
4,455 | 1,652 | |
1.3% | 8.3% | |
8.7 | 8.6 | |
about 23 hours ago | 2 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
AITemplate
-
Show HN: Shortbread, a web app that helps you create AI comics in minutes
VoltaML is a relatively vanilla diffusers-based backend, so its not a hairy monster to hack like you may have seen with SAI-based UIs.
The AITTemplate code is a lightly modified version of Facebook's example, code, to get rid of small issues like VRAM spikes: https://github.com/facebookincubator/AITemplate/tree/main/ex...
InvokeAI is also diffusers based, but they seem to mess with the pipeline a bit more.
And anyway, all that may be a better reference for interesting features rather than a backend to try and adopt.
-
List of all the ways to improve performance for stable diffusion.
let me know if you discover any more ways to improve SD. I am currently looking into facebooks AITemplate : https://github.com/facebookincubator/AITemplate
- [R] AITemplate Python to AMD compiler {META}
-
Nearly 2x speedup for SD rendering using AITemplate
Link to AITemplate itself: https://github.com/facebookincubator/AITemplate
- Render a neural network into CUDA/HIP code
- Render neural network into CUDA/HIP code
- AITemplate: a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
- A1111 vs Olive vs AITemplate.
DeepSpeed-MII
- Stable Diffusion plus DeepSpeed
-
[D] When chatGPT stops being free: Run SOTA LLM in cloud
Microsoft/DeepSpeed-MII for an up 40x reduction on inference cost on Azure, this thing also supports int8 and fp16 bloom out of the box, but it fails on Azure due to instance size.
- Image Creation Time for each GPU.
-
Anyone tried DeepSpeed-MII with stablediffusion?
Haven't tried it yet but they have some example code here: https://github.com/microsoft/DeepSpeed-MII/blob/main/examples/local/txt2img-example.py
- [P] Pure C/C++ port of OpenAI's Whisper
What are some alternatives?
stable-diffusion-webui - Stable Diffusion web UI
whisper.cpp - Port of OpenAI's Whisper model in C/C++
nebuly - The user analytics platform for LLMs
petals - 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
xformers - Hackable and optimized Transformers building blocks, supporting a composable construction.
voltaML - âš¡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.
whisper-rs - Rust bindings to https://github.com/ggerganov/whisper.cpp
stable-diffusion-tensorflow - Stable Diffusion in TensorFlow / Keras
XNNPACK - High-efficiency floating-point neural network inference operators for mobile, server, and Web
rocm-gfx803