A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Here you can share your experience with the project you are suggesting or its comparison with exllama. Optional.
A valid email to send you a verification link when necessary or log in.