Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Why do you think that https://github.com/NVIDIA/TensorRT-LLM is a good alternative to gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Why do you think that https://github.com/NVIDIA/TensorRT-LLM is a good alternative to gpt-fast