trl
llama-recipes
trl | llama-recipes | |
---|---|---|
13 | 9 | |
8,176 | 9,418 | |
4.9% | 12.1% | |
9.7 | 9.8 | |
1 day ago | 2 days ago | |
Python | Jupyter Notebook | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
trl
- FLaNK Stack 29 Jan 2024
-
OOM Error while using TRL for RLHF Fine-tuning
I am using TRL for RLHF fine-tuning the Llama-2-7B model and getting an OOM error (even with batch_size=1). If anyone used TRL for RLHF can please tell me what I am doing wrong? Code details can be found in the GitHub issue.
-
[D] Tokenizers Truncation during Fine-tuning with Large Texts
SFTtrainer from huggingface
-
New Open-source LLMs! ๐คฏ The Falcon has landed! 7B and 40B
For lora - PEFT seems to work. I don't have patience to wait 5 hours, but modifying this example seems to work. You don't even need to modify that much, as their model just as neo-x uses query_key_value name for self-attention.
-
[D] Using RLHF beyond preference tuning
They have examples of making GPT output more positive (code) by using a sentiment model as reward. There are other examples about reducing toxicity, summarization here: https://github.com/lvwerra/trl/tree/main/examples . Should be fairly simple to modify the sentiment example and try the calculator reward you mentioned above.
-
[R] ๐ค๐ Unlock the Power of Personal AI: Introducing ChatLLaMA, Your Custom Personal Assistant! ๐๐ฌ
You can use this -> https://github.com/lvwerra/trl/blob/main/examples/sentiment/scripts/gpt-neox-20b_peft/merge_peft_adapter.py
-
[R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003
Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft
-
[R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF)
This package is pretty simple to use! https://github.com/lvwerra/trl
- Transformer Reinforcement Learning
- trl: Train transformer language models with reinforcement learning
llama-recipes
- Prompt Engineering with Llama2
-
Purple Llama by Meta AI
There are a whole bunch of prompts for this here: https://github.com/facebookresearch/llama-recipes/commit/109...
- [D] Recommendation for LLM fine-tuning codebase
- FLaNK Stack Weekly for 27 November 2023
- Finetune codellama for code completion task on specific programming language
-
[D] Tokenizers Truncation during Fine-tuning with Large Texts
Llama-recipes
-
How to fine tune llama2?
You can also try the recipe here https://github.com/facebookresearch/llama-recipes/blob/main/quickstart.ipynb
- Examples and recipes for Llama 2 model
- Llama Recipes
What are some alternatives?
lm-human-preferences - Code for the paper Fine-Tuning Language Models from Human Preferences
FLaNK-OpenAi - Chat
alpaca-lora - Instruct-tune LLaMA on consumer hardware
llm-toys - Small(7B and below) finetuned LLMs for a diverse set of useful tasks
trlx - A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
LLaVA - [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLaMA-8bit-LoRA - Repository for Chat LLaMA - training a LoRA for the LLaMA (1 or 2) models on HuggingFace with 8-bit or 4-bit quantization. Research only.
CogVLM - a state-of-the-art-level open visual language model | ๅคๆจกๆ้ข่ฎญ็ปๆจกๅ
sparsegpt-for-LLaMA - Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
BakLLaVA
Deep_Object_Pose - Deep Object Pose Estimation (DOPE) โ ROS inference (CoRL 2018)
pymobiledevice3 - Pure python3 implementation for working with iDevices (iPhone, etc...).