Our great sponsors
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Microsoft and NVIDIA present the Megatron-Turing Natural Language Generation model (MT-NLG), powered by DeepSpeed and Megatron, the largest and robust monolithic transformer language model trained with 530 billion parameters.
Microsoft and NVIDIA present the Megatron-Turing Natural Language Generation model (MT-NLG), powered by DeepSpeed and Megatron, the largest and robust monolithic transformer language model trained with 530 billion parameters.
Related posts
- [P][D] A100 is much slower than expected at low batch size for text generation
- DeepSpeed-FastGen: High-Throughput for LLMs via MII and DeepSpeed-Inference
- DeepSpeed-FastGen: High-Throughput Text Generation for LLMs
- Why async gradient update doesn't get popular in LLM community?
- DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models (r/MachineLearning)