[D] DeepSpeed vs PyTorch native API

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gpt-neox

52 6,569 8.9 Python

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Both EleutherAI's gpt-neox and the BigScience project use DeepSpeed under the hood, probably because DeepSpeed still remains the best component for training large models. So really dependent on your scale if DeepSpeed is still your answer, or if you can get away with these native PyTorch alternatives.

Megatron-DeepSpeed

1 1,237 2.4 Python

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Both EleutherAI's gpt-neox and the BigScience project use DeepSpeed under the hood, probably because DeepSpeed still remains the best component for training large models. So really dependent on your scale if DeepSpeed is still your answer, or if you can get away with these native PyTorch alternatives.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
fairscale

6 2,902 2.8 Python

PyTorch extensions for high performance and large scale training.

Things are slowly moving into PyTorch upstream such as the ZeRO redundancy optimizer but from my experience the team behind DeepSpeed just move faster. There is also fairscale from the FAIR team which seems to be a staging ground for experimental optimizations before they move into PyTorch. If you use Lightning, it's easy enough to try out these various libraries (docs here)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project