Our great sponsors
-
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
-
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Both EleutherAI's gpt-neox and the BigScience project use DeepSpeed under the hood, probably because DeepSpeed still remains the best component for training large models. So really dependent on your scale if DeepSpeed is still your answer, or if you can get away with these native PyTorch alternatives.
Both EleutherAI's gpt-neox and the BigScience project use DeepSpeed under the hood, probably because DeepSpeed still remains the best component for training large models. So really dependent on your scale if DeepSpeed is still your answer, or if you can get away with these native PyTorch alternatives.
Things are slowly moving into PyTorch upstream such as the ZeRO redundancy optimizer but from my experience the team behind DeepSpeed just move faster. There is also fairscale from the FAIR team which seems to be a staging ground for experimental optimizations before they move into PyTorch. If you use Lightning, it's easy enough to try out these various libraries (docs here)