trl
alpaca-lora
trl | alpaca-lora | |
---|---|---|
13 | 107 | |
8,120 | 18,197 | |
4.3% | - | |
9.7 | 3.6 | |
4 days ago | 2 months ago | |
Python | Jupyter Notebook | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
trl
- FLaNK Stack 29 Jan 2024
-
OOM Error while using TRL for RLHF Fine-tuning
I am using TRL for RLHF fine-tuning the Llama-2-7B model and getting an OOM error (even with batch_size=1). If anyone used TRL for RLHF can please tell me what I am doing wrong? Code details can be found in the GitHub issue.
-
[D] Tokenizers Truncation during Fine-tuning with Large Texts
SFTtrainer from huggingface
-
New Open-source LLMs! π€― The Falcon has landed! 7B and 40B
For lora - PEFT seems to work. I don't have patience to wait 5 hours, but modifying this example seems to work. You don't even need to modify that much, as their model just as neo-x uses query_key_value name for self-attention.
-
[D] Using RLHF beyond preference tuning
They have examples of making GPT output more positive (code) by using a sentiment model as reward. There are other examples about reducing toxicity, summarization here: https://github.com/lvwerra/trl/tree/main/examples . Should be fairly simple to modify the sentiment example and try the calculator reward you mentioned above.
-
[R] π€π Unlock the Power of Personal AI: Introducing ChatLLaMA, Your Custom Personal Assistant! ππ¬
You can use this -> https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt-neox-20b_peft/merge_peft_adapter.py
-
[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003
Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft
-
[R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF)
This package is pretty simple to use! https://github.com/lvwerra/trl
- Transformer Reinforcement Learning
- trl: Train transformer language models with reinforcement learning
alpaca-lora
-
How to deal with loss for SFT for CausalLM
Here is a example: https://github.com/tloen/alpaca-lora/blob/main/finetune.py
-
How to Finetune Llama 2: A Beginner's Guide
In this blog post, I want to make it as simple as possible to fine-tune the LLaMA 2 - 7B model, using as little code as possible. We will be using the Alpaca Lora Training script, which automates the process of fine-tuning the model and for GPU we will be using Beam.
-
Fine-tuning LLMs with LoRA: A Gentle Introduction
Implement the code in Llama LoRA repo in a script we can run locally
-
Newbie here - trying to install a Alpaca Lora and hitting an error
Hi all - relatively new to GitHub / programming in general, and I wanted to try to set up Alpaca Lora locally. Following the guide here: https://github.com/tloen/alpaca-lora
-
A simple repo for fine-tuning LLMs with both GPTQ and bitsandbytes quantization. Also supports ExLlama for inference for the best speed.
Follow up the popular work of u/tloen alpaca-lora, I wrapped the setup of alpaca_lora_4bit to add support for GPTQ training in form of installable pip packages. You can perform training and inference with multiple quantizations method to compare the results.
- FLaNK Stack Weekly for 20 June 2023
-
Converting to GGML?
If instead you want to apply a LoRa to a pytorch model, a lot of people use this script to apply to LoRa to the 16 bit model and then quantize it with a GPTQ program afterwards https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py
-
Simple LLM Watermarking - Open Lllama 3b LORA
There are a few papers on watermarking LLM output, but from what I have seen they all use complex methods of detection to allow the watermark to go unseen by the end user, only to be detected by algorithm. I believe that a more overt system of watermarking might also be beneficial. One simple method that I have tried is character substitution. For this model, I LORA finetuned openlm-research/open_llama_3b on the alpaca_data_cleaned_archive.json dataset from https://github.com/tloen/alpaca-lora/ modified by replacing all instances of the "." character in the outputs with a "αΎΎ" The results are pretty good, with the correct the correct substitutions being generated by the model in most cases. It doesn't always work, but this was only a LORA training and for two epochs of 400 steps each, and 100% substitution isn't really required.
-
text-generation-webui's "Train Only After" option
I am kind of new to finetuning LLM's and am not able to understand what this option exactly refers to. I guess it has the same meaning as the "train_on_inputs" parameter of alpacalora though.
-
Learning sources on working with local LLMs
Read the paper and also: https://github.com/tloen/alpaca-lora
What are some alternatives?
lm-human-preferences - Code for the paper Fine-Tuning Language Models from Human Preferences
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
trlx - A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
qlora - QLoRA: Efficient Finetuning of Quantized LLMs
LLaMA-8bit-LoRA - Repository for Chat LLaMA - training a LoRA for the LLaMA (1 or 2) models on HuggingFace with 8-bit or 4-bit quantization. Research only.
llama.cpp - LLM inference in C/C++
sparsegpt-for-LLaMA - Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
gpt4all - gpt4all: run open-source LLMs anywhere
llama-recipes - Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
llama - Inference code for Llama models
Deep_Object_Pose - Deep Object Pose Estimation (DOPE) β ROS inference (CoRL 2018)
ggml - Tensor library for machine learning