SaaSHub helps you find the best software and product alternatives Learn more →
Top 17 Python model-compression Projects
-
Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Torch-Pruning
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
-
Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
-
model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
TensorFlow Model Optimization Toolkit
-
-
-
archai
Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.
-
KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
-
-
-
only_train_once_personal_footprint
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
-
-
EuLLM
Open-source platform for creating, distributing and running sovereign EU-compliant LLMs. Verticalize any model for your domain, language and brand. AI Act ready.
Project mention: Show HN: Replacing cloud LLM APIs with local, domain-specific models | news.ycombinator.com | 2026-03-25 -
-
SatQuant
Fixing TFLite INT8 quantization for small objects in satellite imagery. A drop-in wrapper for focus-based calibration.
Project mention: SatQuant: Fix YOLOv8 quantization accuracy on satellite imagery (Edge TPU) | news.ycombinator.com | 2025-11-28 -
glq
E8 lattice codebook quantization for LLM weights — 2/3/4 bpw with fused Triton inference kernel
-
Project mention: Show HN: Ported Cerebras Reap to MLX – Prune Moe Experts on a MacBook | news.ycombinator.com | 2026-06-01
Python model-compression discussion
Python model-compression related posts
-
CVPR Edition: Voxel51 Filtered Views Newsletter - June 21, 2024
-
Llama33B vs Falcon40B vs MPT30B
-
[P] Help: I want to compress EfficientnetV2 using pruning.
-
SqueezeLLM: Dense-and-Sparse Quantization
-
New quantization method SqueezeLLM allows for loseless compression for 3-bit and outperforms GPTQ and AWQ in both 3-bit and 4-bit. Quantized Vicuna and LLaMA models have been released.
-
Researchers From China Introduce Vision GNN (ViG): A Graph Neural Network For Computer Vision Systems
-
GNN for computer vision, beating CNN & Transformer
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 Jun 2026
Index
What are some of the best open-source model-compression projects in Python? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Efficient-AI-Backbones | 4,417 |
| 2 | Torch-Pruning | 3,313 |
| 3 | Pretrained-Language-Model | 3,158 |
| 4 | model-optimization | 1,573 |
| 5 | DeepCache | 964 |
| 6 | SqueezeLLM | 718 |
| 7 | archai | 485 |
| 8 | KVQuant | 421 |
| 9 | q-diffusion | 370 |
| 10 | picollm | 311 |
| 11 | only_train_once_personal_footprint | 310 |
| 12 | SVD-LLM | 295 |
| 13 | EuLLM | 25 |
| 14 | MQAT | 6 |
| 15 | SatQuant | 3 |
| 16 | glq | 3 |
| 17 | reap-mlx | 0 |