dataset

Reinforcement-Learning-for-Human-Feedback-RLHF

SJ9VRF

or hover any field below to flag it

Overview

Name

Source

SJ9VRF

Episodes

Robot count

Format

other

Description

This repository contains the implementation of a Reinforcement Learning with Human Feedback (RLHF) system using custom datasets. The project utilizes the trlX library for training a preference model that integrates human feedback directly into the optimization of language models.

Robots used

null

Links

HuggingFace dataset

null