simmediumoffline-rlmetric · varies

Mildly Conservative Regularized Evaluation for Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets without further environment interaction. A key challenge is the distribution shift between the learned and behavior policies, leading to out-of-distribution (OOD) actions and overestimation. To prevent gross overestimation, the value function must remain conservative; however, excessive conservatism may hinder performance improvement. To address this, we propose the mildly conservative regularized evaluation

Source

http://arxiv.org/abs/2508.05960v1