[D] Using RLHF beyond preference tuning

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

trl

13 8,120 9.7 Python

Train transformer language models with reinforcement learning.

They have examples of making GPT output more positive (code) by using a sentiment model as reward. There are other examples about reducing toxicity, summarization here: https://github.com/lvwerra/trl/tree/main/examples . Should be fairly simple to modify the sentiment example and try the calculator reward you mentioned above.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

An Exploration of Software-defined networks in video streaming, Part Three: Performance of a streaming system over a SDN

1 project | dev.to | 1 May 2024
Clasificador de imágenes con una red neuronal convolucional (CNN)

2 projects | dev.to | 1 May 2024
Ask HN: Modern Day Equivalent to HyperCard?

5 projects | news.ycombinator.com | 1 May 2024
CommaAgents, LLM AutoGenish like system for building LLM systems

1 project | news.ycombinator.com | 1 May 2024
Monitor Postgres replication slot growth via Slack

1 project | news.ycombinator.com | 1 May 2024

[D] Using RLHF beyond preference tuning

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Post date: 14 Apr 2023

trl

InfluxDB

Related posts

An Exploration of Software-defined networks in video streaming, Part Three: Performance of a streaming system over a SDN

Clasificador de imágenes con una red neuronal convolucional (CNN)

Ask HN: Modern Day Equivalent to HyperCard?

CommaAgents, LLM AutoGenish like system for building LLM systems

Monitor Postgres replication slot growth via Slack