Implementation of Block Recurrent Transformer - Pytorch
Why do you think that https://github.com/lucidrains/flash-attention-jax is a good alternative to block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
Why do you think that https://github.com/lucidrains/flash-attention-jax is a good alternative to block-recurrent-transformer-pytorch