Top 23 quantization Open-Source Projects

LLaMA-Factory

2 20,248 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: Show HN: GPU Prices on eBay | news.ycombinator.com | 2024-02-23

Depends what model you want to train, and how well you want your computer to keep working while you're doing it.
If you're interested in large language models there's a table of vram requirements for fine-tuning at [1] which says you could do the most basic type of fine-tuning on a 7B parameter model with 8GB VRAM.
You'll find that training takes quite a long time, and as a lot of the GPU power is going on training, your computer's responsiveness will suffer - even basic things like scrolling in your web browser or changing tabs uses the GPU, after all.
Spend a bit more and you'll probably have a better time.
[1] https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

Chinese-LLaMA-Alpaca

4 17,348 8.3 Python

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30

I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
faster-whisper

23 8,899 8.1 Python

Faster Whisper transcription with CTranslate2

Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29

Faster-whisper (https://github.com/SYSTRAN/faster-whisper)

pngquant

5 5,012 6.0 C

Lossy PNG compressor — pngquant command based on libimagequant library

Project mention: Random Code Inspiration Volume 2 | dev.to | 2023-10-01

image-shrinker is a simple, easy to use open source tool for shrinking images. Under the hood it uses pngquant, mozjpg, SVGO, and gifsicle. You can also install these tools individually if you need to compress some images. I often use pngquantafter exporting PNGs for web projects from Figma or similar tools. I literally run it like this:

AutoGPTQ

19 3,781 9.3 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18

Pretrained-Language-Model

1 2,960 6.1 Python

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Project mention: Does anyone know a downloadable chatgpt model that supports conversation in Albanian? | /r/Programimi | 2023-05-16

deepsparse

21 2,878 9.5 Python

Sparsity-aware deep learning inference runtime for CPUs

Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
CTranslate2

14 2,799 8.9 C++

Fast inference engine for Transformer models

Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29

xTuring

31 2,523 8.4 Python

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07

Explore the project on GitHub here.

mixtral-offloading

3 2,235 8.7 Python

Run Mixtral-8x7B models in Colab or consumer desktops

Project mention: DBRX: A New Open LLM | news.ycombinator.com | 2024-03-27

Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...

optimum

8 2,157 9.5 Python

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

Shout out to Huggingface's Optimum – which made it easier to quantize models.

neural-compressor

3 1,964 9.8 Python

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
aimet

2 1,908 9.6 Python

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
model-optimization

1 1,470 6.8 Python

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
mmrazor

4 1,365 2.8 Python

OpenMMLab Model Compression Toolbox and Benchmark.
intel-extension-for-pytorch

14 1,342 9.7 Python

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Project mention: Efficient LLM inference solution on Intel GPU | news.ycombinator.com | 2024-01-20

OK I found it. Looks like they use SYCL (which for some reason they've rebranded to DPC++): https://github.com/intel/intel-extension-for-pytorch/tree/v2...

rwkv.cpp

12 1,100 6.8 C++

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Project mention: Eagle 7B: Soaring past Transformers | news.ycombinator.com | 2024-01-28

There's https://github.com/saharNooby/rwkv.cpp, which related-ish[0] to ggml/llama.cpp
[0]: https://github.com/ggerganov/llama.cpp/issues/846

nncf

2 819 9.7 Python

Neural Network Compression Framework for enhanced OpenVINO™ inference

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06

tinyengine

3 740 5.6 C

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory (by mit-han-lab)

Project mention: [D] Run Pytorch model inference on Microcontroller | /r/MachineLearning | 2023-11-14

TinyEngine from MCUNet. Looks great, targeting ARM CM4.

finn

4 665 9.7 Python

Dataflow compiler for QNN inference on FPGAs

Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13

FINN - https://github.com/Xilinx/finn

gpu_poor

3 623 8.3 JavaScript

Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

Project mention: Ask HN: Cheapest way to run local LLMs? | news.ycombinator.com | 2023-11-26

Here's a simple calculator for LLM inference requirements: https://rahulschand.github.io/gpu_poor/

quanto

1 575 9.7 Python

A pytorch Quantization Toolkit

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

SqueezeLLM

5 569 6.9 Python

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Project mention: Llama33B vs Falcon40B vs MPT30B | /r/LocalLLaMA | 2023-07-05

Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

quantization related posts

Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

7 projects | dev.to | 29 Apr 2024
Apple Explores Home Robotics as Potential 'Next Big Thing'

3 projects | news.ycombinator.com | 4 Apr 2024
Half-Quadratic Quantization of Large Machine Learning Models

1 project | news.ycombinator.com | 14 Mar 2024
Eagle 7B: Soaring past Transformers

2 projects | news.ycombinator.com | 28 Jan 2024
New Mixtral HQQ Quantzied 4-bit/2-bit configuration

1 project | news.ycombinator.com | 18 Dec 2023
[D] Which framework do you use for applying post-training quantization on image classification models?

1 project | /r/MachineLearning | 9 Dec 2023
Half-Quadratic Quantization of Large Machine Learning Models

3 projects | news.ycombinator.com | 7 Dec 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 2 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source quantization projects? This list will help you:

	Project	Stars
1	LLaMA-Factory	20,248
2	Chinese-LLaMA-Alpaca	17,348
3	faster-whisper	8,899
4	pngquant	5,012
5	AutoGPTQ	3,781
6	Pretrained-Language-Model	2,960
7	deepsparse	2,878
8	CTranslate2	2,799
9	xTuring	2,523
10	mixtral-offloading	2,235
11	optimum	2,157
12	neural-compressor	1,964
13	aimet	1,908
14	model-optimization	1,470
15	mmrazor	1,365
16	intel-extension-for-pytorch	1,342
17	rwkv.cpp	1,100
18	nncf	819
19	tinyengine	740
20	finn	665
21	gpu_poor	623
22	quanto	575
23	SqueezeLLM	569