simmediumoffline-rlmetric · varies

Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood

Description

Offline Reinforcement Learning (RL) struggles with distributional shifts, leading to the $Q$-value overestimation for out-of-distribution (OOD) actions. Existing methods address this issue by imposing constraints; however, they often become overly conservative when evaluating OOD regions, which constrains the $Q$-function generalization. This over-constraint issue results in poor $Q$-value estimation and hinders policy improvement. In this paper, we introduce a novel approach to achieve better $

Source

http://arxiv.org/abs/2506.08417v1