← Back to Benchmarks
simmediumlocomotionmetric · varies
PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers
Description
Preference-based reinforcement learning (PbRL) shows promise in aligning robot behaviors with human preferences, but its success depends heavily on the accurate modeling of human preferences through reward models. Most methods adopt Markovian assumptions for preference modeling (PM), which overlook the temporal dependencies within robot behavior trajectories that impact human evaluations. While recent works have utilized sequence modeling to mitigate this by learning sequential non-Markovian rew