Implementation of Flash Attention in Jax
Why do you think that https://github.com/OATML/RHO-Loss is a good alternative to flash-attention-jax
Implementation of Flash Attention in Jax
Why do you think that https://github.com/OATML/RHO-Loss is a good alternative to flash-attention-jax