awesome-RLHF
hh-rlhf
awesome-RLHF | hh-rlhf | |
---|---|---|
6 | 6 | |
2,775 | 1,447 | |
5.5% | 2.5% | |
7.0 | 3.6 | |
9 days ago | 8 months ago | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
awesome-RLHF
-
OpenDILab Awesome Paper Collection: RL with Human Feedback (3)
Recently, OpenDILab made a paper collection about Reinforcement Learning with Human Feedback (RLHF) and it has been open-sourced on GitHub. This repository is dedicated to helping researchers to collect the latest papers on RLHF, so that they can get to know this area better and more easily.
-
Awesome RLHF (RL with Human Feedback) This is a collection of research papers for Reinforcement Learning with Human Feedback (RLHF). And the repository will be continuously updated to track the frontier of RLHF. Welcome to follow and star! https://github.com/opendilab/awesome-RLHF
Welcome to follow and star! https://github.com/opendilab/awesome-RLHF
-
OpenDILab Awesome Paper Collection: RL with Human Feedback (1)
Here we’re gonna introduce a new repository open-sourced by OpenDILab. Recently, OpenDILab made a paper collection about Reinforcement Learning with Human Feedback (RLHF) and it has been open-sourced on GitHub. This repository is dedicated to helping researchers to collect the latest papers on RLHF, so that they can get to know this area better and more easily. About RLHF Reinforcement Learning with Human Feedback (RLHF) is an extended branch of Reinforcement Learning (RL) that allows the RLHF family of methods to incorporate human feedback into the training process by using this feedback to construct By using this feedback to build a reward model neural network that provides reward signals to help RL intelligences learn, human needs, preferences, and perceptions can be more naturally communicated to the intelligence in an interactive learning manner, aligning the optimization goals between humans and artificial intelligence to produce systems that behave in a manner consistent with human values. Reinforcement Learning with Human Feedback (RLHF) is an extended branch of Reinforcement Learning. When the optimization goal is abstract and it's very difficult to define the specific reward function, RLHF can help to put human feedback into the training process. This feedback can be constructed into a reward neural network model so that RL agents can learn from the given reward signal and naturally convey human needs, preference and attitude to agents through interactive learning.
- A collection of research papers for Reinforcement Learning with Human Feedback (RLHF)
hh-rlhf
-
Meta wants its open source AI model to be as capable as OpenAI’s best model
If you ask an LLM to complete a sentence like '[Insert name] stole the fruit (true/false):'
An aligned LLM will be biased towards refusing to answer at all with something like: "I can't tell you because I don't know them."
An "uncensored" LLM will very happily return <"true"> or <"false"> with a probability attached to each. Even OpenAI's GPT-3 does with a low enough temperature.
_
Of course, LLM attention doesn't work like that. The tokens are just a bag of numbers:
- The fact the name 'John' is mentioned in the Bible a lot affects the distribution when you ask if any John stole, because John is always [7554]
- The fact that 'Olf' is part of Adolf and Adolf Hitler is mentioned in a lot of negative sentences will drag the distribution, because 'Olf' is always [4024] and Adolf is always [324, 4024]
You could have asked something with no logical probability difference at all, like:
- 'The store attendant's name was [name], did the child in Long Island drop his ball (true/false):'
And unless you train the model to give you disclaimers it still follows the instruction faithfull and returns true/false with probabilities, demonstrating a deep regression in reasoning...
That's why for models past a certain size, alignment increases performance: https://arxiv.org/abs/2204.05862.
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human
-
OpenDILab Awesome Paper Collection: RL with Human Feedback (3)
Title: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
-
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA
It just hasn't been prompted or fine-tuned to have the neutral, self effacing personality of ChatGPT.
It's doing the pure, "try to guess the most likely next token" task on which they were both trained (https://heartbeat.comet.ml/causal-language-modeling-with-gpt...) (before the reinforcement from human feedback to make them more tool-like https://arxiv.org/abs/2204.05862), with a bit of randomness added for variety's sake (https://huggingface.co/blo1g/how-to-generate).
-
[D] Is Anthropic influential in research?
They have done good work like releasing their paper and dataset for training an assistant RLHF model. https://github.com/anthropics/hh-rlhf
-
[R] Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - Anthropic - Ganguli et al 2022
Github: https://github.com/anthropics/hh-rlhf
What are some alternatives?
Practical_RL - A course in reinforcement learning in the wild
nebuly - The user analytics platform for LLMs
LaMDA-rlhf-pytorch - Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
stanford_alpaca - Code and documentation to train Stanford's Alpaca models, and generate the data.
deep-learning-drizzle - Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
alpaca-7b-truss
visual-chatgpt - Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [Moved to: https://github.com/microsoft/TaskMatrix]
alpaca-lora - Instruct-tune LLaMA on consumer hardware
LLM-As-Chatbot - LLM as a Chatbot Service
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
til - Today I Learned