gpt-fast
stable-fast
gpt-fast | stable-fast | |
---|---|---|
8 | 11 | |
5,152 | 965 | |
3.5% | - | |
8.3 | 9.4 | |
3 days ago | about 2 months ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
gpt-fast
-
[D] GPT-Fast performance on larger batch sizes
I'm toying around with gpt-fast (https://github.com/pytorch-labs/gpt-fast) and was wondering if anyone has run experiments @ BS>1?
- Optimum-NVIDIA - 28x faster inference in just 1 line of code !?
- GPT-Fast: Simple and efficient GPT inference in <1000 LOC of Python
-
GPT-Fast: A fast and hackable implementation of transformer inference in <1000 lines of native PyTorch with support for quantization, speculative decoding, TP, Nvidia/AMD support, and more!
And check out the code here: https://github.com/pytorch-labs/gpt-fast
-
80% faster, 50% less memory, 0% loss of accuracy Llama finetuning
How does this compare to PyTorch labs optimizations for Sam and llama2 ?
https://github.com/pytorch-labs/segment-anything-fast
https://github.com/pytorch-labs/gpt-fast
- Fast and hackable PyTorch native transformer inference
-
Accelerating Generative AI with PyTorch II: GPT, Fast
I'm wondering if gpt-fast has a version that can be run from Windows Command Prompt or Powershell?
https://github.com/pytorch-labs/gpt-fast/issues/45
stable-fast
-
Has anyone managed to get TensorRT working in ComfyUI on Windows?
Download (https://github.com/chengzeyi/stable-fast/releases) and install stable-fast binary, compiled according to your system: pip install stable_fast-0.0.13.post3+torch210cu118-cp310-cp310-win_amd64.whl
- Optimum-NVIDIA - 28x faster inference in just 1 line of code !?
- stable-fast for SD inference: Faster than AITemplate, On par with TensorRT
- [N] stable-fast for SD inference: Faster than AITemplate, On par with TensorRT
- Stable-fast for SD inference: Faster than AITemplate, On par with TensorRT
-
SDXL Turbo: A Real-Time Text-to-Image Generation Model
SDXL and ControlNet are already optimized, if thats what you mean: https://github.com/chengzeyi/stable-fast
(Note the links to various SD compilers).
But the whole field is moving so fast that people aren't even adopting the compilers at large.
-
Getting sub 100ms refresh rate on LCMs
> already compiling
Hmm, well if you mean torch.compile, y'all should still check out stable-fast, which is claiming ~16ms/iter on a 4090:
https://github.com/chengzeyi/stable-fast#rtx-4090-512x512-ba...
-
Generate images fast with SD 1.5 while typing on Gradio
Now combine this with an optimized SD implementation, like:
https://github.com/chengzeyi/stable-fast
Or AITemplate, and you are at 15FPS on a larger consumer GPU. 10 with a controlnet you can use for some motion consistency.
-
S-LoRA: Serving Concurrent LoRA Adapters
Since I am sending you down the rabbit hole anyway, you should check out sfast:
https://github.com/chengzeyi/stable-fast
It's, the most promising "fast" and flexible stable diffusion implementation akin to this paper or vLLM that I know of. It doesn't have as many caveats as other implementations, like AITemplate (which is basically Turing+ and linux only) or torch.compile (basically no support for changing inputs/loras).
-
🚀Announcing stable-fast v0.0.5: Speed Optimization for SDXL, Dynamic CUDA Graph
About 2 weeks ago, I released the stable-fast project, which is a lightweight inference performance optimization framework for HuggingFace Diffusers. It provides best performance while keeping the compilation dynamic and flexible, and supports ControlNet and LoRA seamlessly.
What are some alternatives?
unsloth - Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
Fooocus - Focus on prompting and generating
TensorRT-LLM - TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
hyperlearn - 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
optimum-nvidia
segment-anything-fast - A batched offline inference oriented version of segment-anything