GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ (by qwopqwop200)

GPTQ-for-LLaMa Alternatives

Similar projects and alternatives to GPTQ-for-LLaMa

  1. llama.cpp

    LLM inference in C/C++

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. textgen

    Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.

  4. Open-Assistant

    OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

  5. transformers

    🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

  6. whisper.cpp

    Port of OpenAI's Whisper model in C/C++

  7. llama

    Inference code for Llama models

  8. alpaca-lora

    107 GPTQ-for-LLaMa VS alpaca-lora

    Instruct-tune LLaMA on consumer hardware

  9. petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  10. alpaca.cpp

    Discontinued Locally run an Instruction-Tuned Chat-Style LLM

  11. AMD-SHARK-Studio

    AMD-SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution

  12. bitsandbytes

    Accessible large language models via k-bit quantization for PyTorch.

  13. exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  14. tinygrad

    Discontinued You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad] (by geohot)

  15. llm

    Discontinued [Unmaintained, see README] An ecosystem of Rust libraries for working with large language models (by rustformers)

  16. AutoGPTQ

    Discontinued An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

  17. GPTQ-for-LLaMa

    4 bits quantization of LLaMa using GPTQ (by oobabooga)

  18. sd-webui-modelscope-text2video

    Discontinued Auto1111 extension consisting of implementation of text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies [Moved to: https://github.com/deforum-art/sd-webui-text2video]

  19. erasing

    11 GPTQ-for-LLaMa VS erasing

    Erasing Concepts from Diffusion Models

  20. qlora

    86 GPTQ-for-LLaMa VS qlora

    QLoRA: Efficient Finetuning of Quantized LLMs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better GPTQ-for-LLaMa alternative or higher similarity.

GPTQ-for-LLaMa discussion

Log in or Post with

GPTQ-for-LLaMa reviews and mentions

Posts with mentions or reviews of GPTQ-for-LLaMa. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-10.
  • [P] Early in 2023 I put in a lot of work on a new machine learning project. Now I'm not sure what to do with it.
    1 project | /r/MachineLearning | 3 Dec 2023
    First I want to make it clear this is not a self promotion post. I hope many machine learning people come at me with questions or comments about this project. A little background about myself. I did work on the 4 bits quantization of LLaMA using GPTQ. (https://github.com/qwopqwop200/GPTQ-for-LLaMa). I've been studying AI in-depth for many years now.
  • GPT-4 Details Leaked
    3 projects | news.ycombinator.com | 10 Jul 2023
    Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .

    If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...

  • Rambling
    1 project | /r/PygmalionAI | 30 Jun 2023
    I use gptq-for-llama - from this https://github.com/qwopqwop200/GPTQ-for-LLaMa and Pygmalion 7B.
  • Now that ExLlama is out with reduced VRAM usage, are there any GPTQ models bigger than 7b which can fit onto an 8GB card?
    2 projects | /r/LocalLLaMA | 29 Jun 2023
    exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.
  • GGML – AI at the Edge
    11 projects | news.ycombinator.com | 6 Jun 2023
    With a single NVIDIA 3090 and the fastest inference branch of GPTQ-for-LLAMA https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-i..., I get a healthy 10-15 tokens per second on the 30B models. IMO GGML is great (And I totally use it) but it's still not as fast as running the models on GPU for now.
  • New quantization method AWQ outperforms GPTQ in 4-bit and 3-bit with 1.45x speedup and works with multimodal LLMs
    4 projects | /r/LocalLLaMA | 2 Jun 2023
    And exactly what Triton version are they comparing against? I just tried the latest version of this, and on my 4090/12900K I get 77 tokens per second for Llama 7B-128g. My own GPTQ CUDA implementation gets 151 tokens/second on the same model, same hardware. That makes it 96% faster, whereas AWQ is only 79% faster. For 30B-128g I'm currently only getting a 110% speedup over Triton compared to their 178%, but it still seems a little disingenuous to compare against their own CUDA implementation only, when they're trying to present the quantization method as being faster for inference.
  • Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
    9 projects | /r/LocalLLaMA | 1 Jun 2023
    Thanks for the explanation. I think some repos, like text generation webui used gptq for llama (I don't know if it's this repo or another one), anyway most repo that I saw use external things (like gptq for llama)
  • How to use AMD GPU?
    4 projects | /r/LocalLLaMA | 1 Jun 2023
    cd ../.. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b triton cd GPTQ-for-LLaMa pip install -r requirements.txt mkdir -p ../text-generation-webui/repositories ln -s ../../GPTQ-for-LLaMa ../text-generation-webui/repositories/GPTQ-for-LLaMa
  • Help needed with installing quant_cuda for the WebUI
    2 projects | /r/LocalLLaMA | 31 May 2023
    cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa pip install -r requirements.txt
  • The installed version of bitsandbytes was compiled without GPU support
    2 projects | /r/Oobabooga | 29 May 2023
    # To use the GPTQ models I need to Install GPTQ-for-LLaMa and the monkey patch mkdir repositories cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b triton cd GPTQ-for-LLaMa pip install ninja pip install -r requirements.txt cd cd text-generation-webui # download random model python download-model.py xxx/yyy # try to start the gui python server.py # It returns this warning but it runs bin /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
  • A note from our sponsor - SaaSHub
    www.saashub.com | 13 Jun 2026
    SaaSHub helps you find the best software and product alternatives Learn more →

Stats

Basic GPTQ-for-LLaMa repo stats
75
3,071
0.5
almost 2 years ago

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you know that Python is
the 1st most popular programming language
based on number of references?