[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
Why do you think that https://github.com/Dao-AILab/flash-attention is a good alternative to recurrent-memory-transformer
[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.
Why do you think that https://github.com/Dao-AILab/flash-attention is a good alternative to recurrent-memory-transformer