This repository contains the TensorFlow implementation of the paper "AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE" known as vision transformers.
Why do you think that https://github.com/google-research/maxvit is a good alternative to vision_transformer_tf