A simple but complete full-attention transformer with a set of promising experimental features from various papers
Why do you think that https://github.com/facebookresearch/metaseq is a good alternative to x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Why do you think that https://github.com/facebookresearch/metaseq is a good alternative to x-transformers