Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues. Learn more →
Top 14 Python model-compression Projects
-
Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
-
-
model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
-
Project mention: CVPR Edition: Voxel51 Filtered Views Newsletter - June 21, 2024 | dev.to | 2024-06-21
Project page
-
-
archai
Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
-
KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
-
only_train_once_personal_footprint
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
-
Project mention: On-Device LLM Inference Powered by X-Bit Quantization | news.ycombinator.com | 2024-05-29
-
-
UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
-
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
Python model-compression discussion
Python model-compression related posts
-
CVPR Edition: Voxel51 Filtered Views Newsletter - June 21, 2024
-
Llama33B vs Falcon40B vs MPT30B
-
[P] Help: I want to compress EfficientnetV2 using pruning.
-
SqueezeLLM: Dense-and-Sparse Quantization
-
New quantization method SqueezeLLM allows for loseless compression for 3-bit and outperforms GPTQ and AWQ in both 3-bit and 4-bit. Quantized Vicuna and LLaMA models have been released.
-
Researchers From China Introduce Vision GNN (ViG): A Graph Neural Network For Computer Vision Systems
-
GNN for computer vision, beating CNN & Transformer
-
A note from our sponsor - Judoscale
judoscale.com | 21 Apr 2025
Index
What are some of the best open-source model-compression projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | Efficient-AI-Backbones | 4,186 |
2 | Pretrained-Language-Model | 3,081 |
3 | Torch-Pruning | 2,980 |
4 | model-optimization | 1,531 |
5 | DeepCache | 886 |
6 | SqueezeLLM | 685 |
7 | archai | 475 |
8 | q-diffusion | 347 |
9 | KVQuant | 339 |
10 | only_train_once_personal_footprint | 302 |
11 | picollm | 233 |
12 | SVD-LLM | 198 |
13 | UPop | 101 |
14 | MQAT | 3 |