A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Why do you think that https://github.com/lucidrains/enformer-pytorch is a good alternative to mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Why do you think that https://github.com/lucidrains/enformer-pytorch is a good alternative to mixture-of-experts