Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Why do you think that https://github.com/microsoft/DeepSpeed is a good alternative to st-moe-pytorch
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Why do you think that https://github.com/microsoft/DeepSpeed is a good alternative to st-moe-pytorch