dataset

hh-rlhf-helpful-base

trl-lib

or hover any field below to flag it

Overview

Name

Source

trl-lib

Episodes

Robot count

Format

parquet

Description

HH-RLHF-Helpful-Base Dataset Summary The HH-RLHF-Helpful-Base dataset is a processed version of Anthropic's HH-RLHF dataset, specifically curated to train models using the TRL library for preference learning and alignment tasks. It contains pairs of text samples, each labeled as either "chosen" or "rejected," based on human preferences regarding the helpfulness of the responses. This dataset enables models to learn human preferences in generating helpful responses… See the full description on the dataset page: https://huggingface.co/datasets/trl-lib/hh-rlhf-helpful-base.

Robots used

null

Links

HuggingFace dataset

trl-lib/hh-rlhf-helpful-base