Python quantization

Open-source Python projects categorized as quantization

Top 23 Python quantization Projects

quantization
  1. LlamaFactory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Project mention: Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs | news.ycombinator.com | 2025-09-18
  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server | dev.to | 2026-05-09

    Transcribes it locally using faster-whisper

  4. Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  5. bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch.

  6. nunchaku

    [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

  7. optimum

    🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

  8. llm-compressor

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

  9. Pretrained-Language-Model

    Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

  10. ao

    PyTorch native quantization and sparsity for training and inference

    Project mention: Gemma 3 270M re-implemented in pure PyTorch for local tinkering | news.ycombinator.com | 2025-08-20
  11. xTuring

    Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

  12. neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

  13. aimet

    AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

  14. mixtral-offloading

    Run Mixtral-8x7B models in Colab or consumer desktops

  15. mmrazor

    OpenMMLab Model Compression Toolbox and Benchmark.

  16. model-optimization

    A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

    Project mention: 95% Accurate Wake Word Detection: Low-Power CNN + MFCC Guide | dev.to | 2025-10-19

    TensorFlow Model Optimization Toolkit

  17. auto-round

    A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

    Project mention: Advanced Quantization Algorithm for LLMs | news.ycombinator.com | 2026-05-01

    hmm... at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy, AutoRound pushes that to ~99.4–100.n% (??) the gap is roughly 0.1–0.7 percentage points

    https://github.com/intel/auto-round/blob/main/docs/gguf_alg_...

  18. nncf

    Neural Network Compression Framework for enhanced OpenVINO™ inference

  19. z80ai

    Z80-μLM is a 2-bit quantized language model small enough to run on an 8-bit Z80 processor. Train conversational models in Python, export them as CP/M .COM binaries, and chat with your vintage computer.

    Project mention: Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB | news.ycombinator.com | 2025-12-28
  20. optimum-quanto

    A pytorch quantization backend for optimum

  21. finn

    Dataflow compiler for QNN inference on FPGAs

  22. hqq

    Official implementation of Half-Quadratic Quantization (HQQ)

  23. OmniQuant

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

  24. SqueezeLLM

    [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python quantization discussion

Log in or Post with

Python quantization related posts

  • I built a free, local video transcription tool, because I didn't want to pay $10/hour or upload my files to a stranger's server

    2 projects | dev.to | 9 May 2026
  • Advanced Quantization Algorithm for LLMs

    2 projects | news.ycombinator.com | 1 May 2026
  • LLM compressor: compress models for efficient deployment

    1 project | news.ycombinator.com | 20 Aug 2024
  • Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

    7 projects | dev.to | 29 Apr 2024
  • Apple Explores Home Robotics as Potential 'Next Big Thing'

    3 projects | news.ycombinator.com | 4 Apr 2024
  • Half-Quadratic Quantization of Large Machine Learning Models

    1 project | news.ycombinator.com | 14 Mar 2024
  • New Mixtral HQQ Quantzied 4-bit/2-bit configuration

    1 project | news.ycombinator.com | 18 Dec 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 9 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source quantization projects in Python? This list will help you:

# Project Stars
1 LlamaFactory 71,870
2 faster-whisper 23,393
3 Chinese-LLaMA-Alpaca 18,949
4 bitsandbytes 8,258
5 nunchaku 3,861
6 optimum 3,409
7 llm-compressor 3,331
8 Pretrained-Language-Model 3,158
9 ao 2,843
10 xTuring 2,667
11 neural-compressor 2,651
12 aimet 2,634
13 mixtral-offloading 2,329
14 mmrazor 1,672
15 model-optimization 1,573
16 auto-round 1,436
17 nncf 1,169
18 z80ai 1,092
19 optimum-quanto 1,042
20 finn 1,003
21 hqq 940
22 OmniQuant 899
23 SqueezeLLM 718

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?