dataset
Reinforcement-Learning-for-Human-Feedback-RLHF
SJ9VRF
or hover any field below to flag it
Overview
Name
Reinforcement-Learning-for-Human-Feedback-RLHF
Source
SJ9VRF
Episodes
0
Robot count
0
Format
other
Description
This repository contains the implementation of a Reinforcement Learning with Human Feedback (RLHF) system using custom datasets. The project utilizes the trlX library for training a preference model that integrates human feedback directly into the optimization of language models.
Robots used
null
Links
HuggingFace dataset
null