Suggest an alternative to

recurrent-memory-transformer

[NeurIPS 22] [AAAI 24] Recurrent Transformer-based long-context architecture.

Why do you think that https://github.com/Dao-AILab/flash-attention is a good alternative to recurrent-memory-transformer