dataset
hh-rlhf-helpful-base
trl-lib
or hover any field below to flag it
Overview
Name
hh-rlhf-helpful-base
Source
trl-lib
Episodes
0
Robot count
0
Format
parquet
Description
HH-RLHF-Helpful-Base Dataset
Summary
The HH-RLHF-Helpful-Base dataset is a processed version of Anthropic's HH-RLHF dataset, specifically curated to train models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as either "chosen" or "rejected," based on human preferences regarding the helpfulness of the responses. This dataset enables models to learn human preferences in generating helpful responses… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/hh-rlhf-helpful-base.
Robots used
null
Links
HuggingFace dataset