AutoGPTQ vs gpt-llama.cpp

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm. (by AutoGPTQ)

gpt-llama.cpp

A llama.cpp drop-in replacement for OpenAI's GPT endpoints, allowing GPT-powered apps to run off local llama.cpp models instead of OpenAI. (by keldenl)

Suggest topics

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

AutoGPTQ		gpt-llama.cpp
	Project
19	Mentions	12
3,806	Stars	587
5.0%	Growth	-
9.3	Activity	8.2
4 days ago	Latest Commit	11 months ago
Python	Language	JavaScript
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

AutoGPTQ

Posts with mentions or reviews of AutoGPTQ. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-10.

Setting up LLAMA2 70B Chat locally
1 project | /r/developersIndia | 18 Aug 2023
Experience of setting up LLAMA 2 70B Chat locally
1 project | /r/LocalLLaMA | 17 Aug 2023
GPT-4 Details Leaked
3 projects | news.ycombinator.com | 10 Jul 2023

Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .
If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...
Loader Types
4 projects | /r/oobaboogazz | 26 Jun 2023

AutoGPTQ: an attempt at standardizing GPTQ-for-LLaMa and turning it into a library that is easier to install and use, and that supports more models. https://github.com/PanQiWei/AutoGPTQ
WizardLM-33B-V1.0-Uncensored
1 project | /r/LocalLLaMA | 24 Jun 2023
Any help converting an interesting .bin model to 4 bit 128g GPTQ? Bloke?
1 project | /r/LocalLLaMA | 18 Jun 2023

Just use the script: https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/quantization/quant_with_alpaca.py
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
5 projects | news.ycombinator.com | 10 Jun 2023

In the wild, people tend to use GTPQ quantization for pure GPU inference: https://github.com/PanQiWei/AutoGPTQ
And ggml's quant for CPU inference with some offload, which just got updated to a more GPTQ-like method days ago: https://github.com/ggerganov/llama.cpp/pull/1684
Some other runtimes like Apache TVM also have their own quant implementations: https://github.com/mlc-ai/mlc-llm
For training, 4-bit bitsandbytes is SOTA, as far as I know.
TBH I'm not sure why this November paper is being linked. Few are running 8 bit models when they could fit a better 3-5 bit model in the same memory pool.
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
9 projects | /r/LocalLLaMA | 1 Jun 2023

Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.
AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
1 project | /r/aipromptprogramming | 1 Jun 2023

1 project | /r/AutoGPT | 31 May 2023

gpt-llama.cpp

Posts with mentions or reviews of gpt-llama.cpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-06-22.

Attempt to run Llama on a remote server with chatbot-ui
2 projects | /r/LocalLLaMA | 22 Jun 2023

hi! I really like the solution https://github.com/keldenl/gpt-llama.cpp which helps to deploy https://github.com/mckaywrigley/chatbot-ui on the local model. I am running this together with Wizard7b or 13b locally and it works fine, but when I tried to upload to a remote server I met an error.
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
9 projects | /r/LocalLLaMA | 1 Jun 2023

sounds like you’re asking for exactly this? https://github.com/keldenl/gpt-llama.cpp
LLaMA and AutoAPI?
1 project | /r/LocalLLaMA | 17 May 2023
New big update to GPTNicheFinder: better trends analysis and scoring system, cleaned up UI and verbose in the terminal for people who want to see what is going on and to verify the results
2 projects | /r/GPT3 | 16 May 2023

I salut you good sir. This is an amazing idea. I don't have time but it will be interesting idea to use this wrapper https://github.com/keldenl/gpt-llama.cpp which simulates GPT endpoint for local lama, so basically we can have amazing tool for completely free use. If somebody test it please let me know underneath my comment!
I build an AI powered writing tools, an AI co-author
1 project | /r/singularity | 29 Apr 2023

I would gladly buy your product to run with a local model, like Vicuna ggml , also see https://github.com/keldenl/gpt-llama.cpp/
Serge... Just works
3 projects | /r/LocalLLaMA | 28 Apr 2023

possible through fastllama in python or gpt-llama.cpp an API wrapper around llama.cpp
Embeddings?
3 projects | /r/LocalLLaMA | 24 Apr 2023

https://github.com/keldenl/gpt-llama.cpp supports embeddings, and it even takes in openai type requests and returns openai compatible responses!
I built a completely Local AutoGPT with the help of GPT-llama running Vicuna-13B
1 project | news.ycombinator.com | 24 Apr 2023

https://github.com/keldenl/gpt-llama.cpp
I build a completely Local and portable AutoGPT with the help of gpt-llama, running on Vicuna-13b
4 projects | /r/LocalLLaMA | 24 Apr 2023
Adding Long-Term Memory to Custom LLMs: Let's Tame Vicuna Together!
7 projects | /r/LocalLLaMA | 21 Apr 2023

There's a (kind of) working Auto-GPT solution that uses Vicuna https://github.com/keldenl/gpt-llama.cpp/blob/master/docs/Auto-GPT-setup-guide.md

What are some alternatives?

When comparing AutoGPTQ and gpt-llama.cpp you can also consider the following projects:

exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

llama_index - LlamaIndex is a data framework for your LLM applications

llama.cpp - LLM inference in C/C++

Auto-LLM-Local - Created my own python script similar to AutoGPT where you supply a local llm model like alpaca13b (The main one I use), and the script can access the supplied tools to achieve your objective. Code fully works as far as I can tell. Takes me 5 minutes per chain on my slow laptop.

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

long_term_memory - A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.

basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

langchain - ⚡ Building applications with LLMs through composability ⚡ [Moved to: https://github.com/langchain-ai/langchain]

GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ

semantic-kernel - Integrate cutting-edge LLM technology quickly and easily into your apps

self-refine - LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

langchain - 🦜🔗 Build context-aware reasoning applications

AutoGPTQ vs exllama gpt-llama.cpp vs llama_index AutoGPTQ vs llama.cpp gpt-llama.cpp vs Auto-LLM-Local AutoGPTQ vs text-generation-webui gpt-llama.cpp vs long_term_memory AutoGPTQ vs basaran gpt-llama.cpp vs langchain AutoGPTQ vs GPTQ-for-LLaMa gpt-llama.cpp vs semantic-kernel AutoGPTQ vs self-refine gpt-llama.cpp vs langchain

Compare AutoGPTQ vs gpt-llama.cpp and see what are their differences.

AutoGPTQ

gpt-llama.cpp

AutoGPTQ

gpt-llama.cpp

What are some alternatives?