dataset

hh-rlhf-helpful-base

trl-lib

or hover any field below to flag it

Overview

Name
hh-rlhf-helpful-base
Source
trl-lib
Episodes
0
Robot count
0
Format
parquet
Description
HH-RLHF-Helpful-Base Dataset Summary The HH-RLHF-Helpful-Base dataset is a processed version of Anthropic's HH-RLHF dataset, specifically curated to train models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as either "chosen" or "rejected," based on human preferences regarding the helpfulness of the responses. This dataset enables models to learn human preferences in generating helpful responses… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/hh-rlhf-helpful-base.
Robots used
null

Links

HuggingFace dataset