A PyTorch implementation of the Transformer model in "Attention is All You Need".
Why do you think that https://github.com/google-research/long-range-arena is a good alternative to attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Why do you think that https://github.com/google-research/long-range-arena is a good alternative to attention-is-all-you-need-pytorch