FQ-ViT
SqueezeLLM
FQ-ViT | SqueezeLLM | |
---|---|---|
2 | 5 | |
263 | 569 | |
0.4% | 3.3% | |
1.1 | 6.9 | |
about 1 year ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FQ-ViT
-
How to quantize a Swin transformer model?
This my implementation on the approach I shared( https://github.com/megvii-research/FQ-ViT ) on a small dataset from kaggle(link: https://www.kaggle.com/datasets/gauravduttakiit/ants-bees) in this notebook :https://colab.research.google.com/drive/1cqnmosPIVZu3e2SwbO_VbevANk5MppVS?usp=sharing
SqueezeLLM
-
Llama33B vs Falcon40B vs MPT30B
Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.
-
Has anyone tried out Squeezellm?
[Paper][Github][Model]
- SqueezeLLM: Dense-and-Sparse Quantization
- New quantization method SqueezeLLM allows for loseless compression for 3-bit and outperforms GPTQ and AWQ in both 3-bit and 4-bit. Quantized Vicuna and LLaMA models have been released.
What are some alternatives?
Efficient-AI-Backbones - Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
llm-awq - AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Sparsebit - A model compression and acceleration toolbox based on pytorch.
Qwen-7B - The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud. [Moved to: https://github.com/QwenLM/Qwen]
transformer-quantization
Qwen - The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Pretrained-Language-Model - Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
GoLLIE - Guideline following Large Language Model for Information Extraction
LLMCompiler - [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
LocalMentor - Local Startup Advisor Chatbot