AutoGPTQ vs GPTQ-for-llama?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

exllama

64 2,582 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

You might want to take a look at https://github.com/turboderp/exllama

GPTQ-for-LLaMa

19 129 7.7 Python

4 bits quantization of LLaMa using GPTQ (by oobabooga)

If you don't have triton and you use AutoGPTQ you're gonna notice a huge slow down compared to the old GPTQ-for-LLaMA cuda branch. For me AutoGPTQ gives me a whopping 1 token per second compared to the old GPTQ that gives me a decent 9 tokens per second.. both times I used a same sized model. (I think the slowdown is due to AutoGPTQ using the newer cuda branch which is much slower than the old one)

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Einsum in 40 Lines of Python
1 project | news.ycombinator.com | 27 Apr 2024
The Server Chose Violence
1 project | news.ycombinator.com | 27 Apr 2024
Show HN: Cognita – open-source RAG framework for modular applications
1 project | news.ycombinator.com | 27 Apr 2024
Library for Machine learning and quantum computing
4 projects | dev.to | 27 Apr 2024
Ubuntu Desktop 24.04 LTS: Noble Numbat
1 project | news.ycombinator.com | 27 Apr 2024

AutoGPTQ vs GPTQ-for-llama?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 29 May 2023

exllama

GPTQ-for-LLaMa

InfluxDB

Related posts