flash-attention
xformers
flash-attention | xformers | |
---|---|---|
27 | 48 | |
15,061 | 8,920 | |
4.3% | 2.4% | |
9.2 | 9.4 | |
5 days ago | 4 days ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
flash-attention
-
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision
1) Pretty much, it's mathematically equivalent. The only software issues are things like managing dependency versions and data formats in-memory, but Flash Attention 2 is already built into HuggingFace and other popular libraries. Flash Attention 3 probably will be soon, although it requires an H100 GPU to run
2) Flash Attention 2 added support for GQA in past version updates:
https://github.com/Dao-AILab/flash-attention
3) They're comparing this implementation of Flash Attention (which is written in raw CUDA C++) to the Triton implementation of a similar algorithm (which is written in Triton): https://triton-lang.org/main/getting-started/tutorials/06-fu...
-
How the Transformer Architecture Was Likely Discovered: A Step-by-Step Guide
If you're looking for an implementation, I highly recommend checking out fast attention [https://github.com/Dao-AILab/flash-attention]. It's my go-to, and far better than anything we could whip up here using just PyTorch or TensorFlow.
-
Interactive Coloring with ControlNet
* Even if I bought a 3090, I would have to get a computer to go with it, along with a PSU and some cooling. Don't know where to start with that.
[1] https://github.com/Dao-AILab/flash-attention/issues/190
-
Coding Self-Attention, Multi-Head Attention, Cross-Attention, Causal-Attention
highly recommend using Tri's implementation https://github.com/Dao-AILab/flash-attention rotary should be built in, and some group overseas even contributed alibi
-
PSA: new ExLlamaV2 quant method makes 70Bs perform much better at low bpw quants
Doesn't seem so https://github.com/Dao-AILab/flash-attention/issues/542 No updates for a while.
-
VLLM: 24x faster LLM serving than HuggingFace Transformers
I wonder how this compares to Flash Attention (https://github.com/HazyResearch/flash-attention), which is the other "memory aware" Attention project I'm aware of.
I guess Flash Attention is more about utilizing memory GPU SRam correctly, where this is more about using the OS/CPU memory better?
-
Hacking Around ChatGPT’s Character Limits with the Code Interpreter
https://github.com/HazyResearch/flash-attention
- Flash Attention on Consumer
-
Unlimiformer: Long-Range Transformers with Unlimited Length Input
After a very quick read, that's my understanding too: It's just KNN search. So I agree on points 1-3. When something works well, I don't care much about point 4.
I've had only mixed success with KNN search. Maybe I haven't done it right? Nothing seems to work quite as well for me as explicit token-token interactions by some form of attention, which as we all know is too costly for long sequences (O(n²)). Lately I've been playing with https://github.com/hazyresearch/safari , which uses a lot less compute and seems promising. Otherwise, for long sequences I've yet to find something better than https://github.com/HazyResearch/flash-attention for n×n interactions and https://github.com/glassroom/heinsen_routing for n×m interactions. If anyone here has other suggestions, I'd love to hear about them.
-
Ask HN: Bypassing GPT-4 8k tokens limit
Longer sequence length in transformers is an active area of research (see e.g the great work from the Flash-attention team - https://github.com/HazyResearch/flash-attention), and I'm sure will improve things dramatically very soon.
xformers
-
Practical Experience: Integrating Over 50 Neural Networks Into One Open-Source Project
Check xformers Compatibility Visit the xformers GitHub repo to ensure compatibility with your torch and CUDA versions. Support for older versions can be dropped, so staying updated is vital, especially if you're running CUDA 11.8 and want to leverage xformers for limited VRAM.
- An Interview with AMD CEO Lisa Su About Solving Hard Problems
- Animediff error
-
Colab | Errors when installing x-formers
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. fastai 2.7.12 requires torch<2.1,>=1.7, but you have torch 2.1.0+cu118 which is incompatible. torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.1.0+cu118 which is incompatible. torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.1.0+cu118 which is incompatible. torchtext 0.15.2 requires torch==2.0.1, but you have torch 2.1.0+cu118 which is incompatible. torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.1.0+cu118 which is incompatible. WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 2.1.0+cu121 with CUDA 1201 (you have 2.1.0+cu118) Python 3.10.13 (you have 3.10.12) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details xformers version: 0.0.22.post3
-
FlashAttention-2, 2x faster than FlashAttention
This enables V1. V2 is still yet to be integrated into xformers. The team replied saying it should happen this week.
See the relevant Github issue here: https://github.com/facebookresearch/xformers/issues/795
-
Xformers issue
My Xformers doesnt work, any help see code. info ( Exception training model: 'Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers'. ) or
-
Having xformer troubles
ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more
-
Question: these 4 crappy picture have been generated with the same seed and settings. Why they keep coming mildly different?
Xformers is a module that that can be used with Stable Diffusion. It decreases the memory required to generate an image as well as speeding things up. It works very well but there are two problems with Xformers:
-
Stuck trying to update xformers
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.1+cu118) Python 3.10.9 (you have 3.10.7) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details ================================================================================= You are running xformers 0.0.16rc425. The program is tested to work with xformers 0.0.17. To reinstall the desired version, run with commandline flag --reinstall-xformers. Use --skip-version-check commandline argument to disable this check. =================================================================================
-
Question about updating Xformers for A1111
# Your version of xformers is 0.0.16rc425. # xformers >= 0.0.17.dev is required to be available on the Dreambooth tab. # Torch 1 wheels of xformers >= 0.0.17.dev are no longer available on PyPI, # but you can manually download them by going to: https://github.com/facebookresearch/xformers/actions # Click on the most recent action tagged with a release (middle column). # Select a download based on your environment. # Unzip your download # Activate your venv and install the wheel: (from A1111 project root) cd venv/Scripts activate pip install {REPLACE WITH PATH TO YOUR UNZIPPED .whl file} # Then restart your project.
What are some alternatives?
TensorRT - NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
SHARK-Studio - SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
stable-diffusion-webui - Stable Diffusion web UI
RWKV-LM - RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
InvokeAI - Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
memory-efficient-attention-pytorch - Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
Dreambooth-Stable-Diffusion - Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
XMem - [ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
diffusers - 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
alpaca_lora_4bit
stablediffusion - High-Resolution Image Synthesis with Latent Diffusion Models