Ongoing research training transformer language models at scale, including: BERT & GPT-2
Why do you think that https://github.com/EleutherAI/gpt-neox is a good alternative to Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Why do you think that https://github.com/EleutherAI/gpt-neox is a good alternative to Megatron-DeepSpeed