AutoGPTQ
Dependencies
AutoGPTQ | Dependencies | |
---|---|---|
19 | 24 | |
3,806 | 8,176 | |
5.0% | - | |
9.3 | 0.0 | |
5 days ago | about 1 month ago | |
Python | C# | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
AutoGPTQ
- Setting up LLAMA2 70B Chat locally
- Experience of setting up LLAMA 2 70B Chat locally
-
GPT-4 Details Leaked
Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .
If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...
-
Loader Types
AutoGPTQ: an attempt at standardizing GPTQ-for-LLaMa and turning it into a library that is easier to install and use, and that supports more models. https://github.com/PanQiWei/AutoGPTQ
- WizardLM-33B-V1.0-Uncensored
-
Any help converting an interesting .bin model to 4 bit 128g GPTQ? Bloke?
Just use the script: https://github.com/PanQiWei/AutoGPTQ/blob/main/examples/quantization/quant_with_alpaca.py
-
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
In the wild, people tend to use GTPQ quantization for pure GPU inference: https://github.com/PanQiWei/AutoGPTQ
And ggml's quant for CPU inference with some offload, which just got updated to a more GPTQ-like method days ago: https://github.com/ggerganov/llama.cpp/pull/1684
Some other runtimes like Apache TVM also have their own quant implementations: https://github.com/mlc-ai/mlc-llm
For training, 4-bit bitsandbytes is SOTA, as far as I know.
TBH I'm not sure why this November paper is being linked. Few are running 8 bit models when they could fit a better 3-5 bit model in the same memory pool.
-
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
Instead of integrating GPTQ-for-Lllama, use AutoGPTQ instead.
- AutoGPTQ - An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm
Dependencies
-
I can't get Fluidsynth working
I did some digging with Dependencies and found that issue is with libstdc++-6.dll
-
EXE vs MSI
Maybe Dependency Walker can shed some light on that 🤔
-
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
I did that, basically. Problem is there is a clblast.dll (on windows) that llama.dll depends on, and it llama-cpp-python always failed dependency resolve to find it. I copied the dll to the right folder, loading it manually via CDLL worked fine, and https://github.com/lucasg/Dependencies also confirmed the dll was findable. When loading DLL's in windows, it checks the same folder for dependency dll's (and a few other places).
-
Unable to get Meshroom to accept images
You can uses dependencies walker to detect the exact version of MS c++ runtime required.
-
Every time I try to open the game this error message show up. What am I supposed to do?
If that doesn't work you will have to do it the hard way like i did. Using Dependencies to find the missing dlls
- Kenshi 1.0.60 Crashes & Bug reports
-
FFmpeg 6.0
("Dependencies" is Dependencies.exe from https://github.com/lucasg/Dependencies)
- The game wont launch
-
Does the antivirus detecting files as malware depending on the compiling options make any sense?
If you're using MinGW or Cygwin and link in an arbitrary number of system libraries, then you need to ship those files as well. You can use Dependencies to list all DLLs your program is using, including transient dependencies.
-
Software Dependency Tracker
A couple of them, yes. Someone else linked Dependencies which is much more modern and doesn't have some of the issues these older applications have. Thank you for the suggestions regardless.
What are some alternatives?
exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
SharpUnhooker - C# Based Universal API Unhooker
llama.cpp - LLM inference in C/C++
lddtree - Fork of pax-utils' lddtree.sh
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
Windows-Auto-Night-Mode - Automatically switches between the dark and light theme of Windows 10 [Moved to: https://github.com/AutoDarkMode/Windows-Auto-Night-Mode]
basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
clauf - A C interpreter developed live on YouTube
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
deeplabel - A cross-platform desktop image annotation tool for machine learning
self-refine - LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
HexCtrl - Fully-featured Hex Control written in C++/MFC.