SaaSHub helps you find the best software and product alternatives Learn more →
Hh-rlhf Alternatives
Similar projects and alternatives to hh-rlhf
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
chatllama
ChatLLaMA 📢 Open source implementation for LLaMA-based ChatGPT runnable in a single GPU. 15x faster training process than ChatGPT
hh-rlhf reviews and mentions
-
Meta wants its open source AI model to be as capable as OpenAI’s best model
If you ask an LLM to complete a sentence like '[Insert name] stole the fruit (true/false):'
An aligned LLM will be biased towards refusing to answer at all with something like: "I can't tell you because I don't know them."
An "uncensored" LLM will very happily return <"true"> or <"false"> with a probability attached to each. Even OpenAI's GPT-3 does with a low enough temperature.
_
Of course, LLM attention doesn't work like that. The tokens are just a bag of numbers:
- The fact the name 'John' is mentioned in the Bible a lot affects the distribution when you ask if any John stole, because John is always [7554]
- The fact that 'Olf' is part of Adolf and Adolf Hitler is mentioned in a lot of negative sentences will drag the distribution, because 'Olf' is always [4024] and Adolf is always [324, 4024]
You could have asked something with no logical probability difference at all, like:
- 'The store attendant's name was [name], did the child in Long Island drop his ball (true/false):'
And unless you train the model to give you disclaimers it still follows the instruction faithfull and returns true/false with probabilities, demonstrating a deep regression in reasoning...
That's why for models past a certain size, alignment increases performance: https://arxiv.org/abs/2204.05862.
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human
-
OpenDILab Awesome Paper Collection: RL with Human Feedback (3)
Title: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
-
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA
It just hasn't been prompted or fine-tuned to have the neutral, self effacing personality of ChatGPT.
It's doing the pure, "try to guess the most likely next token" task on which they were both trained (https://heartbeat.comet.ml/causal-language-modeling-with-gpt...) (before the reinforcement from human feedback to make them more tool-like https://arxiv.org/abs/2204.05862), with a bit of randomness added for variety's sake (https://huggingface.co/blo1g/how-to-generate).
-
[D] Is Anthropic influential in research?
They have done good work like releasing their paper and dataset for training an assistant RLHF model. https://github.com/anthropics/hh-rlhf
-
[R] Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - Anthropic - Ganguli et al 2022
Github: https://github.com/anthropics/hh-rlhf
-
A note from our sponsor - SaaSHub
www.saashub.com | 2 May 2024
Stats
anthropics/hh-rlhf is an open source project licensed under MIT License which is an OSI approved license.
Sponsored