Our great sponsors
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
And this: https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B
Regarding inference, it seems that Deepspeed doesn't support inference for GPT-J, but they are planning to work on it: https://github.com/microsoft/DeepSpeed/issues/1332
I don't know I was surprised too. The config recommended for TPU in the how-to we used is hard to compare with the one we used for fine-tuning on GPU with Deepspeed. So maybe both are not exactly equal and we're comparing apples to oranges...
Related posts
- [P][D] A100 is much slower than expected at low batch size for text generation
- DeepSpeed-FastGen: High-Throughput for LLMs via MII and DeepSpeed-Inference
- DeepSpeed-FastGen: High-Throughput Text Generation for LLMs
- Why async gradient update doesn't get popular in LLM community?
- DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models (r/MachineLearning)