InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →
Top 23 Python quantization Projects
-
Take a look at the hardware requirements at https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...
A 'LoRA' is a memory-efficient type of fine tuning that only tunes a small fraction of the LLM's parameters. And 'quantisation' reduces an LLM to, say, 4 bits per parameter. So it's feasible to fine-tune a 7B parameter model at home.
Anything bigger than 7B parameters and you'll want to look at renting GPUs on a platform like Runpod. In the current market, there are used 4090s selling on ebay right now for $2100 while runpod will rent you a 4090 for $0.34/hr - you do the math.
It's certainly possible to scale model training to span multiple nodes, but generally scaling through bigger GPUs and more GPUs per machine is easier.
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
-
-
Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14
Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :
https://github.com/SYSTRAN/faster-whisper (speech-to-text)
-
-
Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
-
optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
-
xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
-
aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
Project mention: Quantized Llama models with increased speed and a reduced memory footprint | news.ycombinator.com | 2024-10-24
You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length
Some pretty charts here https://github.com/pytorch/ao/issues/539
-
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Project mention: Intel Announces Arc B-Series "Battlemage" Discrete Graphics with Linux Support | news.ycombinator.com | 2024-12-03 -
nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Project mention: WorldGen: Open-Source 3D Scene Generator for Game/VR/XR | news.ycombinator.com | 2025-04-30Have you seen https://github.com/mit-han-lab/nunchaku for running flux on lower VRAM and thus faster performance?
I have put up a PR https://github.com/ZiYang-xie/WorldGen/pull/7
-
-
model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
-
llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Project mention: LLM compressor: compress models for efficient deployment | news.ycombinator.com | 2024-08-20 -
-
-
-
-
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python quantization discussion
Python quantization related posts
-
LLM compressor: compress models for efficient deployment
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
-
Half-Quadratic Quantization of Large Machine Learning Models
-
New Mixtral HQQ Quantzied 4-bit/2-bit configuration
-
[D] Which framework do you use for applying post-training quantization on image classification models?
-
Half-Quadratic Quantization of Large Machine Learning Models
-
A note from our sponsor - InfluxDB
www.influxdata.com | 19 May 2025
Index
What are some of the best open-source quantization projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | LLaMA-Factory | 49,134 |
2 | Chinese-LLaMA-Alpaca | 18,816 |
3 | faster-whisper | 15,973 |
4 | deepsparse | 3,149 |
5 | Pretrained-Language-Model | 3,081 |
6 | optimum | 2,898 |
7 | xTuring | 2,646 |
8 | neural-compressor | 2,400 |
9 | mixtral-offloading | 2,303 |
10 | aimet | 2,302 |
11 | ao | 2,036 |
12 | intel-extension-for-pytorch | 1,844 |
13 | nunchaku | 1,738 |
14 | mmrazor | 1,597 |
15 | model-optimization | 1,534 |
16 | llm-compressor | 1,337 |
17 | nncf | 1,010 |
18 | optimum-quanto | 934 |
19 | finn | 823 |
20 | hqq | 812 |
21 | OmniQuant | 805 |
22 | SqueezeLLM | 688 |
23 | fastT5 | 578 |