← Back to Benchmarks
simmediumoffline-rlmetric · varies
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Description
Offline reinforcement learning (RL) aims to learn an effective policy from a static dataset. To alleviate extrapolation errors, existing studies often uniformly regularize the value function or policy updates across all states. However, due to substantial variations in data quality, the fixed regularization strength often leads to a dilemma: Weak regularization strength fails to address extrapolation errors and value overestimation, while strong regularization strength shifts policy learning tow