Suggest an alternative to

FlexGen

Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen]

A URL to the alternative repo (e.g. GitHub, GitLab)

Here you can share your experience with the project you are suggesting or its comparison with FlexGen. Optional.

A valid email to send you a verification link when necessary or log in.