flash-attention

Fast and memory-efficient exact attention (by Dao-AILab)

Flash-attention Alternatives

Similar projects and alternatives to flash-attention

  1. diffusers

    🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  4. GFPGAN

    GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

  5. RWKV-LM

    RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

  6. DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  7. xformers

    Hackable and optimized Transformers building blocks, supporting a composable construction.

  8. StableLM

    43 flash-attention VS StableLM

    StableLM: Stability AI Language Models

  9. lm-evaluation-harness

    A framework for few-shot evaluation of language models.

  10. kernl

    8 flash-attention VS kernl

    Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

  11. TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

  12. RWKV-v2-RNN-Pile

    RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

  13. memory-efficient-attention-pytorch

    Discontinued Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

  14. heinsen_routing

    Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All Domains" (Heinsen, 2019), for composing deep neural networks.

  15. safari

    Convolutions for Sequence Modeling

  16. TruthfulQA

    5 flash-attention VS TruthfulQA

    TruthfulQA: Measuring How Models Imitate Human Falsehoods

  17. XMem

    [ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

  18. CodeRL

    This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (NeurIPS22).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better flash-attention alternative or higher similarity.

flash-attention discussion

Log in or Post with

flash-attention reviews and mentions

Posts with mentions or reviews of flash-attention. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-07-11.

Stats

Basic flash-attention repo stats
27
15,142
9.2
5 days ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com