LLM-As-Chatbot vs hh-rlhf

LLM-As-Chatbot

LLM as a Chatbot Service (by deep-diver)

hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback" (by anthropics)

Suggest topics

Source Code

arxiv.org

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

LLM-As-Chatbot		hh-rlhf
	Project
3	Mentions	6
3,242	Stars	1,447
-	Growth	2.5%
9.0	Activity	3.6
6 months ago	Latest Commit	8 months ago
Python	Language
Apache License 2.0	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

LLM-As-Chatbot

Posts with mentions or reviews of LLM-As-Chatbot. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-13.

OpenAI's GPT-4 Red Teamer Nathan Labenz: the GPT-4 base model recommends assassinating humans, naming specific targets
2 projects | /r/singularity | 13 Apr 2023

The first one is from https://github.com/deep-diver/Alpaca-LoRA-Serve
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA
10 projects | news.ycombinator.com | 22 Mar 2023

this is useless because it doesn't handle context:
Q: Name five genres of music.
A: Jazz, country, hip-hop, blues, classical.
Q: Name a famous artist from the third genre.
A: Salvador Dalí.
Whereas this one actually supports context: https://github.com/deep-diver/Alpaca-LoRA-Serve
Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
16 projects | news.ycombinator.com | 21 Mar 2023

hh-rlhf

Posts with mentions or reviews of hh-rlhf. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-14.

Meta wants its open source AI model to be as capable as OpenAI’s best model
1 project | news.ycombinator.com | 11 Sep 2023

If you ask an LLM to complete a sentence like '[Insert name] stole the fruit (true/false):'
An aligned LLM will be biased towards refusing to answer at all with something like: "I can't tell you because I don't know them."
An "uncensored" LLM will very happily return <"true"> or <"false"> with a probability attached to each. Even OpenAI's GPT-3 does with a low enough temperature.
_
Of course, LLM attention doesn't work like that. The tokens are just a bag of numbers:
- The fact the name 'John' is mentioned in the Bible a lot affects the distribution when you ask if any John stole, because John is always [7554]
- The fact that 'Olf' is part of Adolf and Adolf Hitler is mentioned in a lot of negative sentences will drag the distribution, because 'Olf' is always [4024] and Adolf is always [324, 4024]
You could have asked something with no logical probability difference at all, like:
- 'The store attendant's name was [name], did the child in Long Island drop his ball (true/false):'
And unless you train the model to give you disclaimers it still follows the instruction faithfull and returns true/false with probabilities, demonstrating a deep regression in reasoning...
That's why for models past a certain size, alignment increases performance: https://arxiv.org/abs/2204.05862.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human
1 project | news.ycombinator.com | 3 Aug 2023
OpenDILab Awesome Paper Collection: RL with Human Feedback （3）
2 projects | /r/u_OpenDILab | 14 May 2023

Title: Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Show HN: ChatLLaMA – A ChatGPT style chatbot for Facebook's LLaMA
10 projects | news.ycombinator.com | 22 Mar 2023

It just hasn't been prompted or fine-tuned to have the neutral, self effacing personality of ChatGPT.
It's doing the pure, "try to guess the most likely next token" task on which they were both trained (https://heartbeat.comet.ml/causal-language-modeling-with-gpt...) (before the reinforcement from human feedback to make them more tool-like https://arxiv.org/abs/2204.05862), with a bit of randomness added for variety's sake (https://huggingface.co/blo1g/how-to-generate).
[D] Is Anthropic influential in research?
1 project | /r/MachineLearning | 30 Dec 2022

They have done good work like releasing their paper and dataset for training an assistant RLHF model. https://github.com/anthropics/hh-rlhf
[R] Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned - Anthropic - Ganguli et al 2022
1 project | /r/MachineLearning | 26 Aug 2022

Github: https://github.com/anthropics/hh-rlhf

What are some alternatives?

When comparing LLM-As-Chatbot and hh-rlhf you can also consider the following projects:

alpaca-lora - Instruct-tune LLaMA on consumer hardware

nebuly - The user analytics platform for LLMs

simple-llm-finetuner - Simple UI for LLM Model Finetuning

stanford_alpaca - Code and documentation to train Stanford's Alpaca models, and generate the data.

peft - 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

awesome-RLHF - A curated list of reinforcement learning with human feedback resources (continually updated)

alpaca-7b-truss

alpaca.cpp - Locally run an Instruction-Tuned Chat-Style LLM (Android/Linux/Windows/Mac)

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

LLM-As-Chatbot vs alpaca-lora hh-rlhf vs nebuly LLM-As-Chatbot vs simple-llm-finetuner hh-rlhf vs stanford_alpaca LLM-As-Chatbot vs peft hh-rlhf vs awesome-RLHF LLM-As-Chatbot vs alpaca-7b-truss hh-rlhf vs alpaca-7b-truss LLM-As-Chatbot vs stanford_alpaca hh-rlhf vs alpaca-lora LLM-As-Chatbot vs alpaca.cpp hh-rlhf vs text-generation-webui

Compare LLM-As-Chatbot vs hh-rlhf and see what are their differences.

LLM-As-Chatbot

hh-rlhf

LLM-As-Chatbot

hh-rlhf

What are some alternatives?