empirical-philosophy
alpaca_lora_4bit
empirical-philosophy | alpaca_lora_4bit | |
---|---|---|
9 | 41 | |
141 | 529 | |
- | - | |
2.5 | 8.6 | |
about 1 year ago | 5 months ago | |
TypeScript | Python | |
- | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
empirical-philosophy
-
Google “We Have No Moat, and Neither Does OpenAI”
One way that I've been framing this in my head (and in an application I'm building) is that gpt-3 will be useful for analytic tasks where as gpt-4 will be required for synthetic tasks. I'm using "analytic" and "synthetic" in the same way as in this writeup https://github.com/williamcotton/empirical-philosophy/blob/m...
- How ReAct Prompting Works in Detail
-
Ask HN: People who were laid off or quit recently, how are you doing?
Hey Simon! I've been digging your writings on LLMs lately.
I've been having some decent luck with some of the approaches that I've discussed in the following articles and projects:
From Prompt Alchemy to Prompt Engineering: An Introduction to Analytic Augmentation: https://github.com/williamcotton/empirical-philosophy/blob/m...
https://www.williamcotton.com/articles/writing-web-applicati...
https://github.com/williamcotton/transynthetical-engine
I'd love to hear your thoughts on the matter!
-
We need to tell people ChatGPT will lie to them, not debate linguistics
You’re not actually doing any research.
Here is my research: https://github.com/williamcotton/empirical-philosophy/blob/m...
It is clear that analytic augmentations will result in more factual information.
Your claims are unfounded and untested.
-
ChatGPT and Wolfram Is Insane
Take a look at
https://github.com/williamcotton/empirical-philosophy/blob/m...
https://langchain.readthedocs.io/en/latest/
They can be taught!
-
Prompt Engineering Guide: Guides, papers, and resources for prompt engineering
I've been developing a methodology around prompt engineering that I have found very useful:
https://github.com/williamcotton/empirical-philosophy/blob/m...
A few more edits and it's ready for me to submit to HN and then get literally no further attention!
-
Professor writes history essays with ChatGPT and has students correct them
That's not a rebuttable of a claim that Bing is more accurate.
A proper rebuttable would involve empirical evidence that Bing is no more accurate than other LLM tools that do not add analytical augmentations such as search results to their prompts.
Based on empirical evidence, I find that analytical augmentations do indeed result in more accurate results:
https://github.com/williamcotton/empirical-philosophy/blob/m...
alpaca_lora_4bit
-
Open Inference Engine Comparison | Features and Functionality of TGI, vLLM, llama.cpp, and TensorRT-LLM
For training there is also https://github.com/johnsmith0031/alpaca_lora_4bit
-
Quantized 8k Context Base Models for 4-bit Fine Tuning
I've been trying to fine tune an erotica model on some large context chat history (reverse proxy logs) and a literotica-instruct dataset I made, with a max context of 8k. The large context size eats a lot of VRAM so I've been trying to find the most efficient way to experiment considering I'd like to do multiple runs to test some ideas. So I'm going to try and use https://github.com/johnsmith0031/alpaca_lora_4bit, which is supposed to train faster and use less memory than qlora.
-
A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.
Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.
-
Does we still need monkey patch with exllama loader for lora?
" Using LoRAs with GPTQ-for-LLaMa This requires using a monkey patch that is supported by this web UI: https://github.com/johnsmith0031/alpaca_lora_4bit"
-
Why isn’t QLoRA being used more widely for fine tuning models?
4-bit GPTQ LoRA training was available since early April. I did not see any comparison to it in the QLoRA paper or even a mention, so it makes me think they were not aware it already existed.
- Fine-tuning with alpaca_lora_4bit on 8k context SuperHOT models
-
Any guide/intro to fine-tuning anywhere?
https://github.com/johnsmith0031/alpaca_lora_4bit is still the SOTA - Faster than qlora, trains on a GPTQ base.
-
"Samantha-33B-SuperHOT-8K-GPTQ" now that's a great name for a true model.
I would also like to know how one would finetune this in 4 bit? I think one could take the merged 8K PEFT with the LLaMA weights, and then quantize it to 4 bit, and then train with https://github.com/johnsmith0031/alpaca_lora_4bit ?
-
Help with QLoRA
I was under the impression that you just git clone this repo into text-generation-webui/repositories (so you would have GPTQ_for_Llama and alpaca_lora_4bit in the folder), and then just load with monkey patch. Is that not correct? I also tried just downloading alpaca_lora_4bit on its own, git cloning text-gen-webui within it, and installing requirements.txt for both and running with monkey patch. I was following the sections of alpaca_lora_4bit, "Text Generation Webui Monkey Patch" and "monkey patch inside webui"
-
Best uncensored model for an a6000
I dont have any familiarity with esxi, but I can say that there are quite a few posts about people doing it on proxmox. I've currently got a machine with 2x3090 passing through to VM's. When I'm training, I pass them both through to the same VM and can do lora 4-bit training on llama33 using https://github.com/johnsmith0031/alpaca_lora_4bit. Then, at inference time, I run a single card into a different VM, and have an extra card available for experimentation.
What are some alternatives?
magma-chat - Ruby on Rails 7-based ChatGPT Bot Platform
flash-attention - Fast and memory-efficient exact attention
pal - PaL: Program-Aided Language Models (ICML 2023)
qlora - QLoRA: Efficient Finetuning of Quantized LLMs
guardrails - Adding guardrails to large language models.
StableLM - StableLM: Stability AI Language Models
datasette-chatgpt-plugin - A Datasette plugin that turns a Datasette instance into a ChatGPT plugin
safetensors - Simple, safe way to store and distribute tensors
transynthetical-engine - Applied methods of analytical augmentation to build tools using large-language models.
alpaca-lora - Instruct-tune LLaMA on consumer hardware
stable-diffusion-webui - Stable Diffusion web UI
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.