Suggest an alternative to Fast-Transformer

An implementation of Fastformer: Additive Attention Can Be All You Need, a Transformer Variant in TensorFlow