GPTQ-for-LLaMa
starcoder
GPTQ-for-LLaMa | starcoder | |
---|---|---|
19 | 10 | |
129 | 7,109 | |
- | 0.7% | |
7.7 | 6.6 | |
11 months ago | 2 months ago | |
Python | Python | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
GPTQ-for-LLaMa
-
I have tried various different methods to install, and none work. Can you spoon-feed me how?
git clone https://github.com/oobabooga/GPTQ-for-LLaMa
-
Query output random text
If you're using the model directly from ehartford, that one hasn't been quantized. Try using the GPTQ quantized version here, and use this fork of GPTQ-for-LLaMa. Load in 4-bit with --wbits 4
-
Help needed with installing quant_cuda for the WebUI
This worked for me on Ubuntu. If you want to use the CUDA branch instead of triton, do the same steps except clone this GPTQ-for-LLaMa fork and run python setup_cuda.py install
-
AutoGPTQ vs GPTQ-for-llama?
If you don't have triton and you use AutoGPTQ you're gonna notice a huge slow down compared to the old GPTQ-for-LLaMA cuda branch. For me AutoGPTQ gives me a whopping 1 token per second compared to the old GPTQ that gives me a decent 9 tokens per second.. both times I used a same sized model. (I think the slowdown is due to AutoGPTQ using the newer cuda branch which is much slower than the old one)
-
Guanaco 7B, 13B, 33B and 65B models by Tim Dettmers: now for your local LLM pleasure
Are you using a later version of GPTQ-for-LLaMa? If so, go to ooba's CUDA fork (https://github.com/oobabooga/GPTQ-for-LLaMa). That's what I made it in and it definitely works with that. And that's what's included in the one-click-installers.
-
Any idea Vicuna 13B 4bit model output random content?
This usually happens when using models that conflict with your GPTQ installation. You should be using this fork: https://github.com/oobabooga/GPTQ-for-LLaMa. If you did the manual installation wrong, use the one click installer instead.
-
GPT4All: A little helper to get started
cd text-generation-webui # wherever you have it installed mkdir -p repositories cd repositories git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda GPTQ-for-LLaMa cd GPTQ-for-LLaMa python setup_cuda install
- wizard-vicuna-13B • Hugging Face
-
Anyone actually running 30b/65b at reasonably high speed? What's your rig?
I'm on GPTQ for LLaMA folder under repositories says it's pointed at https://github.com/oobabooga/GPTQ-for-LLaMa.git. But I've run through the instructions and also applied the monkey patch to train and apply 4 bit lora which may come into play. No idea.
-
Trying to run TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g with latest GPTQ-for-LLaMa CUDA branch
git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda
starcoder
- StarCoder: A language model trained on source code and natural language text
-
openai i screwing themselves hard.
Use local LLM like: https://github.com/bigcode-project/starcoder
-
Guanaco 7B, 13B, 33B and 65B models by Tim Dettmers: now for your local LLM pleasure
Here's the script I use to merge a LoRA onto a base model: https://gist.github.com/TheBloke/d31d289d3198c24e0ca68aaf37a19032 (a slightly modified version of https://github.com/bigcode-project/starcoder/blob/main/finetune/merge_peft_adapters.py)
-
Starhugger.el now displays suggestions as overlays
consult: https://github.com/bigcode-project/starcoder/issues/6 https://huggingface.co/bigcode/starcoder/discussions/12
- FLaNK Stack for 15 May 2023
-
GPT-4 Week 7. Government oversight, Strikes, Education, Layoffs & Big tech are moving - Nofil's Weekly Breakdown
StarCoder - The biggest open source code LLM. It’s a free VS code extension. Looks great for coding, makes you wonder how long things like Github Copilot and Ghostwriter can afford to charge when we have open source building things like this. Link to github [Link] Link to HF [Link]
-
Model For Just Coding
Outside of just using GPT4, which works well, this is supposedly the solution, though I haven't tried it just yet. starcoder/README.md at main · bigcode-project/starcoder · GitHub
- BigCode Project Releases StarCoder: A 15B Code LLM
-
StarCoder 15b open-source code model beats Codex and Replit
GitHub link: https://github.com/bigcode-project/starcoder/tree/main
- StarCoder
What are some alternatives?
exllama - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
langflow - ⛓️ Langflow is a dynamic graph where each node is an executable unit. Its modular and interactive design fosters rapid experimentation and prototyping, pushing hard on the limits of creativity.
koboldcpp - A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
SillyTavern - LLM Frontend for Power Users.
SillyTavern - LLM Frontend for Power Users. [Moved to: https://github.com/SillyTavern/SillyTavern]
GPTQ-for-LLaMa - 4 bits quantization of LLaMA using GPTQ
Rath - Next generation of automated data exploratory analysis and visualization platform.
one-click-installers - Simplified installers for oobabooga/text-generation-webui.
openvino_notebooks - 📚 Jupyter notebook tutorials for OpenVINO™
private-gpt - Interact with your documents using the power of GPT, 100% privately, no data leaks
Local-LLM-Comparison-Colab-UI - Compare the performance of different LLM that can be deployed locally on consumer hardware. Run yourself with Colab WebUI.