Guide: Finetune GPT2-XL (1.5 Billion Parameters) and finetune GPT-NEO (2.7 B) on a single GPU with Huggingface Transformers using DeepSpeed
Why do you think that https://github.com/shreyansh26/Extracting-Training-Data-from-Large-Langauge-Models is a good alternative to finetune-gpt2xl