[D] Applications for using reinforcement learning to fine-tune GPT-2

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

trl

13 8,023 9.6 Python

Train transformer language models with reinforcement learning.

I'm investigating the pros and cons of a more naive approach that does not require collecting a dataset of human preferences. Using the trl library, I train a BERT-classifier to distinguish between sarcastic and non-sarcastic reddit comments, and that classifier then serves as a reward model that provides a reward signal for fine-tuning GPT-2 for text generation using PPO. I have applied the same method to the task of generating negative review, by training BERT on the IMDB-dataset. This method of course leads to extensive reward hacking, but investigating how to mitigate that is part of the fun!

lm-human-preferences

8 1,106 2.7 Python

Code for the paper Fine-Tuning Language Models from Human Preferences

Code for https://arxiv.org/abs/1909.08593 found: https://github.com/openai/lm-human-preferences

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Why Vector Compression Matters
3 projects | dev.to | 24 Apr 2024
Scalable Load Balancing Having Cloud GPU Service Salad Tutorial With Whisper Transcriber Gradio APP
1 project | dev.to | 24 Apr 2024
Show HN: I made a website that converts YT videos into step-by-step guides
1 project | news.ycombinator.com | 23 Apr 2024
Metrics for bias in machine learning datasets
1 project | news.ycombinator.com | 23 Apr 2024
Dream – A Distributed RAG Experimentation Framework
2 projects | news.ycombinator.com | 21 Apr 2024

[D] Applications for using reinforcement learning to fine-tune GPT-2

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Post date: 19 Mar 2022

trl

lm-human-preferences

InfluxDB

Related posts