FQ-ViT
Sparsebit
FQ-ViT | Sparsebit | |
---|---|---|
2 | 1 | |
263 | 320 | |
0.4% | 1.3% | |
1.1 | 5.9 | |
about 1 year ago | 4 months ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FQ-ViT
-
How to quantize a Swin transformer model?
This my implementation on the approach I shared( https://github.com/megvii-research/FQ-ViT ) on a small dataset from kaggle(link: https://www.kaggle.com/datasets/gauravduttakiit/ants-bees) in this notebook :https://colab.research.google.com/drive/1cqnmosPIVZu3e2SwbO_VbevANk5MppVS?usp=sharing
Sparsebit
What are some alternatives?
Efficient-AI-Backbones - Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
LLaMA-8bit-LoRA - Repository for Chat LLaMA - training a LoRA for the LLaMA (1 or 2) models on HuggingFace with 8-bit or 4-bit quantization. Research only.
transformer-quantization
sparsegpt-for-LLaMA - Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
tabmat - Efficient matrix representations for working with tabular data
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
alpaca-lora - Instruct-tune LLaMA on consumer hardware
aimet - AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
trl - Train transformer language models with reinforcement learning.
llama.cpp - LLM inference in C/C++
nncf - Neural Network Compression Framework for enhanced OpenVINOâ„¢ inference
neural-compressor - SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime