Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen]
Why do you think that https://github.com/OpenNMT/CTranslate2 is a good alternative to FlexGen