DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Example models using DeepSpeed
Also see the example repo README: https://github.com/microsoft/DeepSpeedExamples/tree/master/a...
> With just one click, you can train, generate and serve a 1.3 billion parameter ChatGPT model within 1.36 hours on a single consumer-grade NVIDIA A6000 GPU with 48GB memory. On a single DGX node with 8 NVIDIA A100-40G GPUs, DeepSpeed-Chat enables training for a 13 billion parameter ChatGPT model in 13.6 hours. On multi-GPU multi-node systems (cloud scenarios),i.e., 8 DGX nodes with 8 NVIDIA A100 GPUs/node, DeepSpeed-Chat can train a 66 billion parameter ChatGPT model under 9 hours. Finally, it enables 15X faster training over the existing RLHF systems
> The following are some of the open-source examples that are powered by DeepSpeed: Databricks Dolly, LMFlow, CarperAI-TRLX, Huggingface-PEFT
(disclaimer: MSFT/GH employee, not affiliated with this project)
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models (r/MachineLearning)
1 project | /r/datascienceproject | 29 Aug 2023
[P] DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
1 project | /r/MachineLearning | 28 Aug 2023
Using --deepspeed requires lots of manual tweaking
3 projects | /r/Oobabooga | 11 May 2023
DeepSpeed Hybrid Engine for reinforcement learning with human feedback (RLHF)
1 project | /r/u_waynerad | 26 Apr 2023
I'm Stephen Gou, Manager of ML / Founding Engineer at Cohere. Our team specializes in developing large language models. Previously at Uber ATG on perception models for self-driving cars. AMA!
1 project | /r/IAmA | 19 Apr 2023