AutoGPTQ vs koboldcpp

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

AutoGPTQ		koboldcpp
	Project
19	Mentions	180
3,806	Stars	3,817
5.0%	Growth	-
9.3	Activity	10.0
4 days ago	Latest Commit	5 days ago
Python	Language	C++
MIT License	License	GNU Affero General Public License v3.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

AutoGPTQ

Posts with mentions or reviews of AutoGPTQ. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-10.

Setting up LLAMA2 70B Chat locally
1 project | /r/developersIndia | 18 Aug 2023
Experience of setting up LLAMA 2 70B Chat locally
1 project | /r/LocalLLaMA | 17 Aug 2023
GPT-4 Details Leaked
3 projects | news.ycombinator.com | 10 Jul 2023

Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .
If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...
Loader Types
4 projects | /r/oobaboogazz | 26 Jun 2023

AutoGPTQ: an attempt at standardizing GPTQ-for-LLaMa and turning it into a library that is easier to install and use, and that supports more models. https://github.com/PanQiWei/AutoGPTQ
WizardLM-33B-V1.0-Uncensored
1 project | /r/LocalLLaMA | 24 Jun 2023
Any help converting an interesting .bin model to 4 bit 128g GPTQ? Bloke?
1 project | /r/LocalLLaMA | 18 Jun 2023

Just use the script: https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/quantization/quant_with_alpaca.py
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
5 projects | news.ycombinator.com | 10 Jun 2023

In the wild, people tend to use GTPQ quantization for pure GPU inference: https://github.com/PanQiWei/AutoGPTQ
And ggml's quant for CPU inference with some offload, which just got updated to a more GPTQ-like method days ago: https://github.com/ggerganov/llama.cpp/pull/1684
Some other runtimes like Apache TVM also have their own quant implementations: https://github.com/mlc-ai/mlc-llm
For training, 4-bit bitsandbytes is SOTA, as far as I know.
TBH I'm not sure why this November paper is being linked. Few are running 8 bit models when they could fit a better 3-5 bit model in the same memory pool.
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
9 projects | /r/LocalLLaMA | 1 Jun 2023

Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.
AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
1 project | /r/aipromptprogramming | 1 Jun 2023

1 project | /r/AutoGPT | 31 May 2023

koboldcpp

Posts with mentions or reviews of koboldcpp. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-27.

Any Online Communities on Local/Home AI?
1 project | news.ycombinator.com | 24 Apr 2024
Koboldcpp-1.62.1 adds support for Command-R+
1 project | news.ycombinator.com | 9 Apr 2024
Show HN: I made an app to use local AI as daily driver
31 projects | news.ycombinator.com | 27 Feb 2024
Easiest way to show my model to my mom?
2 projects | /r/LocalLLaMA | 10 Dec 2023

FYI this is the easiest way to host on the horde: https://github.com/LostRuins/koboldcpp
IT Veteran... why am I struggling with all of this?
6 projects | /r/LocalLLaMA | 7 Dec 2023
What do you use to run your models?
14 projects | /r/LocalLLaMA | 7 Dec 2023
ByteDance AI researcher suggests that open source model more powerful than Gemini to be released soon
1 project | /r/singularity | 7 Dec 2023
i need some help guys
2 projects | /r/KoboldAI | 7 Dec 2023
[Guide] How install KoboldAI in Android via Termux (Update 04-12-2023)
1 project | /r/KoboldAI | 5 Dec 2023

For more information of Koboldcpp look this guide: https://github.com/LostRuins/koboldcpp/wiki
SillyTavern 1.10.10 has been released
2 projects | /r/SillyTavernAI | 28 Nov 2023

Out of curiosity, is there a specific reason for this? The most popular fork KoboldCpp is in active development, and was the first to adopt the Min P sampler, and even distincts itself with the context shift feature. Just wondering what this means for the future. Thanks!

What are some alternatives?

When comparing AutoGPTQ and koboldcpp you can also consider the following projects:

exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

KoboldAI

llama.cpp - LLM inference in C/C++

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

TavernAI - Atmospheric adventure chat for AI language models (KoboldAI, NovelAI, Pygmalion, OpenAI chatgpt, gpt-4)

basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.

KoboldAI - KoboldAI is generative AI software optimized for fictional use, but capable of much more!

GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ

ChatRWKV - ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

self-refine - LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

SillyTavern - LLM Frontend for Power Users. [Moved to: https://github.com/SillyTavern/SillyTavern]

AutoGPTQ vs exllama koboldcpp vs KoboldAI AutoGPTQ vs llama.cpp koboldcpp vs text-generation-webui AutoGPTQ vs text-generation-webui koboldcpp vs TavernAI AutoGPTQ vs basaran koboldcpp vs KoboldAI AutoGPTQ vs GPTQ-for-LLaMa koboldcpp vs ChatRWKV AutoGPTQ vs self-refine koboldcpp vs SillyTavern

Compare AutoGPTQ vs koboldcpp and see what are their differences.

AutoGPTQ

koboldcpp

AutoGPTQ

koboldcpp

What are some alternatives?