simmediumoffline-rlmetric · varies

Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL

Description

Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning tow

Source

http://arxiv.org/abs/2505.19923v1