Suggest an alternative to Transformer-in-Transformer

An Implementation of Transformer in Transformer in TensorFlow for image classification, attention inside local patches