Repo for testing out the awesome DeepSpeed library. Used for guidance. (by ncoop57)


    To test out DeepSpeed, I used the awesome HuggingFace transformers library, which supports using DeepSpeed on their non-stable branch (though support is coming to the stable branch in 4.6 🤓). I followed these awesome instructions on the HuggingFace’s website for getting started with DeepSpeed and HuggingFace. If you want to follow along at home, I created a Github repository with the Dockerfile (I’m addicted to docker and will probably make a blog post on docker too :)) and the test script I used to run my experiments on. I tried training the different versions of the awesome T5 model that ranged from smallish ~60 million parameters to humungous 3 billion parameters. And here are my results: