← Back to Benchmarks
simmediumoffline-rlmetric · varies
CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization
Description
Offline reinforcement learning (offline RL) algorithms often require additional constraints or penalty terms to address distribution shift issues, such as adding implicit or explicit policy constraints during policy optimization to reduce the estimation bias of functions. This paper focuses on a limitation of the Advantage-Weighted Regression family (AWRs), i.e., the potential for learning over-conservative policies due to data corruption, specifically the poor explorations in suboptimal offline