flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only) (by tspeterkim)

Flash-attention-minimal Alternatives

Similar projects and alternatives to flash-attention-minimal

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better flash-attention-minimal alternative or higher similarity.

flash-attention-minimal reviews and mentions

Posts with mentions or reviews of flash-attention-minimal. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-31.
  • Google's First Tensor Processing Unit: Architecture
    2 projects | news.ycombinator.com | 31 Mar 2024
    Vulcan is a driver-level API. It competes with DirectX and OpenGL.

    CUDA is a language you write kernels. It competes with OpenAI's Triton language.

    Here's what CUDA looks like: https://github.com/tspeterkim/flash-attention-minimal/blob/m...

    This is what Triton looks like: https://triton-lang.org/main/getting-started/tutorials/06-fu...

    By contrast Vulcan looks like this: https://github.com/KhronosGroup/Vulkan-Samples/blob/main/sam...

    (It's true to some extent that maybe you could use Vulcan shaders to write deep learning kernels, maybe? I'm not aware of anyone doing it though)

  • Show HN: Flash Attention in ~100 lines of CUDA
    2 projects | news.ycombinator.com | 16 Mar 2024

Stats

Basic flash-attention-minimal repo stats
2
410
5.7
24 days ago

tspeterkim/flash-attention-minimal is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of flash-attention-minimal is Cuda.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com