New quantization method AWQ outperforms GPTQ in 4-bit and 3-bit with 1.45x speedup and works with multimodal LLMs

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • GPTQ-for-LLaMa

    4 bits quantization of LLaMA using GPTQ

  • And exactly what Triton version are they comparing against? I just tried the latest version of this, and on my 4090/12900K I get 77 tokens per second for Llama 7B-128g. My own GPTQ CUDA implementation gets 151 tokens/second on the same model, same hardware. That makes it 96% faster, whereas AWQ is only 79% faster. For 30B-128g I'm currently only getting a 110% speedup over Triton compared to their 178%, but it still seems a little disingenuous to compare against their own CUDA implementation only, when they're trying to present the quantization method as being faster for inference.

  • llm-awq

    AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

  • GitHub: https://github.com/mit-han-lab/llm-awq

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Voyager

    An Open-Ended Embodied Agent with Large Language Models (by MineDojo)

  • Summary of the study by Claude-100k if anyone is interested:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Is there any game that allow us to interact with it by python?

    2 projects | /r/reinforcementlearning | 1 Dec 2023
  • A Coder Considers the Waning Days of the Craft

    2 projects | news.ycombinator.com | 13 Nov 2023
  • Open/Local LLM support for MineDojo/Voyager

    4 projects | /r/LocalLLaMA | 11 Oct 2023
  • Voyager – Minecraft Embodied Agent with Large Language Models

    1 project | news.ycombinator.com | 17 Sep 2023
  • [D] - Are there any AI benchmarks that involve successful longterm problem solving when running as autonomous agents (like in autogpt)? How do we compare the effectiveness of models as agents?

    1 project | /r/MachineLearning | 9 Jul 2023