← Back to Benchmarks
simmediumoffline-rlmetric · varies
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning
Description
Preference-based Reinforcement Learning (PbRL) entails a variety of approaches for aligning models with human intent to alleviate the burden of reward engineering. However, most previous PbRL work has not investigated the robustness to labeler errors, inevitable with labelers who are non-experts or operate under time constraints. Additionally, PbRL algorithms often target very specific settings (e.g. pairwise ranked preferences or purely offline learning). We introduce Similarity as Reward Align