Solutions like Dependabot or Renovate update but don't merge dependencies. You need to do it manually while it could be fully automated! Add a Merge Queue to your workflow and stop caring about PR management & merging. Try Mergify for free. Learn more →
Exllamav2 Alternatives
Similar projects and alternatives to exllamav2
-
-
gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
OmniQuant
OmniQuant is a simple and powerful quantization technique for LLMs.
exllamav2 reviews and mentions
-
70B Llama 2 at 35tokens/second on 4090
Can anyone provide any additional details on the EXL2[0]/GPTQ[1] quantisation, which seems to be the main reason for a speedup in this model?
I had a quick look at the paper which is _reasonably_ clear, but if anyone else has any other sources that are easy to understand, or a quick explanation to give more insight into it, I'd appreciate it.
[0] https://github.com/turboderp/exllamav2#exl2-quantization
> Is it an average between 2 and 3 over all weights?
Yes I think it's an average where different quantization levels are used for different layers or weights. Here are more details about the quantization scheme: https://github.com/turboderp/exllamav2#exl2-quantization
-
A note from our sponsor - Mergify
blog.mergify.com | 24 Sep 2023
Stats
turboderp/exllamav2 is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of exllamav2 is Python.