Top 4 Python post-training-quantization Projects
-
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.
Python post-training-quantization related posts
Index
What are some of the best open-source post-training-quantization projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | neural-compressor | 1,950 |
2 | SqueezeLLM | 566 |
3 | Sparsebit | 319 |
4 | FQ-ViT | 263 |
Sponsored