Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)
Why do you think that https://github.com/lucidrains/tab-transformer-pytorch is a good alternative to Linear-Multihead-Attention