simmediumoffline-rlmetric · varies

CAWR: Corruption-Averse Advantage-Weighted Regression for Robust Policy Optimization

Description

Offline reinforcement learning (offline RL) algorithms often require additional constraints or penalty terms to address distribution shift issues, such as adding implicit or explicit policy constraints during policy optimization to reduce the estimation bias of functions. This paper focuses on a limitation of the Advantage-Weighted Regression family (AWRs), i.e., the potential for learning over-conservative policies due to data corruption, specifically the poor explorations in suboptimal offline

Source

http://arxiv.org/abs/2506.15654v1