Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
quantized-nets
Contains code for Binary, Ternary, N-bit Quantized and Hybrid CNNs for low precision experiments.
+1 On this, the real proof would have been testing both models side-by-side.
It seems that it may be published on GitHub [1] according to HuggingFace [2].
[1] https://github.com/microsoft/unilm/tree/master/bitnet
[2] https://huggingface.co/papers/2402.17764
It does result in a significant degradation relative to unquantized model of the same size, but even with simple llama.cpp K-quantization, it's still worth it all the way down to 2-bit. The chart in this llama.cpp PR speaks for itself:
https://github.com/ggerganov/llama.cpp/pull/1684#issue-17396...
People have been doing this 6 years ago.
https://github.com/yashkant/quantized-nets
https://github.com/TropComplique/trained-ternary-quantization
https://github.com/buaabai/Ternary-Weights-Network
https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...