Python quantization

Open-source Python projects categorized as quantization

Top 23 Python quantization Projects

quantization
  1. LLaMA-Factory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Project mention: Fine-tune Google's Gemma 3 | news.ycombinator.com | 2025-03-19

    Take a look at the hardware requirements at https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

    A 'LoRA' is a memory-efficient type of fine tuning that only tunes a small fraction of the LLM's parameters. And 'quantisation' reduces an LLM to, say, 4 bits per parameter. So it's feasible to fine-tune a 7B parameter model at home.

    Anything bigger than 7B parameters and you'll want to look at renting GPUs on a platform like Runpod. In the current market, there are used 4090s selling on ebay right now for $2100 while runpod will rent you a 4090 for $0.34/hr - you do the math.

    It's certainly possible to scale model training to span multiple nodes, but generally scaling through bigger GPUs and more GPUs per machine is easier.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  4. faster-whisper

    Faster Whisper transcription with CTranslate2

    Project mention: Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model | news.ycombinator.com | 2024-10-14

    Hi, I don't know what's SOTA, but I got good results with these (open source, on-device) :

    https://github.com/SYSTRAN/faster-whisper (speech-to-text)

  5. deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

  6. Pretrained-Language-Model

    Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

  7. optimum

    🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

  8. xTuring

    Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

  11. mixtral-offloading

    Run Mixtral-8x7B models in Colab or consumer desktops

  12. aimet

    AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

  13. ao

    PyTorch native quantization and sparsity for training and inference (by pytorch)

    Project mention: Quantized Llama models with increased speed and a reduced memory footprint | news.ycombinator.com | 2024-10-24

    You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length

    Some pretty charts here https://github.com/pytorch/ao/issues/539

  14. intel-extension-for-pytorch

    A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

    Project mention: Intel Announces Arc B-Series "Battlemage" Discrete Graphics with Linux Support | news.ycombinator.com | 2024-12-03
  15. nunchaku

    [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

    Project mention: WorldGen: Open-Source 3D Scene Generator for Game/VR/XR | news.ycombinator.com | 2025-04-30

    Have you seen https://github.com/mit-han-lab/nunchaku for running flux on lower VRAM and thus faster performance?

    I have put up a PR https://github.com/ZiYang-xie/WorldGen/pull/7

  16. mmrazor

    OpenMMLab Model Compression Toolbox and Benchmark.

  17. model-optimization

    A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

  18. llm-compressor

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Project mention: LLM compressor: compress models for efficient deployment | news.ycombinator.com | 2024-08-20
  19. nncf

    Neural Network Compression Framework for enhanced OpenVINO™ inference

  20. optimum-quanto

    A pytorch quantization backend for optimum

  21. finn

    Dataflow compiler for QNN inference on FPGAs

  22. hqq

    Official implementation of Half-Quadratic Quantization (HQQ)

  23. OmniQuant

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

  24. SqueezeLLM

    [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

  25. fastT5

    ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python quantization discussion

Log in or Post with

Python quantization related posts

  • LLM compressor: compress models for efficient deployment

    1 project | news.ycombinator.com | 20 Aug 2024
  • Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

    7 projects | dev.to | 29 Apr 2024
  • Apple Explores Home Robotics as Potential 'Next Big Thing'

    3 projects | news.ycombinator.com | 4 Apr 2024
  • Half-Quadratic Quantization of Large Machine Learning Models

    1 project | news.ycombinator.com | 14 Mar 2024
  • New Mixtral HQQ Quantzied 4-bit/2-bit configuration

    1 project | news.ycombinator.com | 18 Dec 2023
  • [D] Which framework do you use for applying post-training quantization on image classification models?

    1 project | /r/MachineLearning | 9 Dec 2023
  • Half-Quadratic Quantization of Large Machine Learning Models

    3 projects | news.ycombinator.com | 7 Dec 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 19 May 2025
    InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now. Learn more →

Index

What are some of the best open-source quantization projects in Python? This list will help you:

# Project Stars
1 LLaMA-Factory 49,134
2 Chinese-LLaMA-Alpaca 18,816
3 faster-whisper 15,973
4 deepsparse 3,149
5 Pretrained-Language-Model 3,081
6 optimum 2,898
7 xTuring 2,646
8 neural-compressor 2,400
9 mixtral-offloading 2,303
10 aimet 2,302
11 ao 2,036
12 intel-extension-for-pytorch 1,844
13 nunchaku 1,738
14 mmrazor 1,597
15 model-optimization 1,534
16 llm-compressor 1,337
17 nncf 1,010
18 optimum-quanto 934
19 finn 823
20 hqq 812
21 OmniQuant 805
22 SqueezeLLM 688
23 fastT5 578

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?