simmediumoffline-rlmetric · varies

Adaptive Neighborhood-Constrained Q Learning for Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) suffers from extrapolation errors induced by out-of-distribution (OOD) actions. To address this, offline RL algorithms typically impose constraints on action selection, which can be systematically categorized into density, support, and sample constraints. However, we show that each category has inherent limitations: density and sample constraints tend to be overly conservative in many scenarios, while the support constraint, though least restrictive, faces cha

Source

http://arxiv.org/abs/2511.02567v1