Yet another minimalistic Tensorflow (re-)re-implementation of Karpathy's Pytorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer).
Why do you think that https://github.com/biswajitsahoo1111/D2L_Attention_Mechanisms_in_TF is a good alternative to gpt-mini