simmediumoffline-rlmetric · varies

Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) optimizes policies from a previously collected static dataset and is an important branch of RL. A popular and promising approach is to regularize actor-critic methods with behavior cloning (BC), which yields realistic policies and mitigates bias from out-of-distribution actions, but can impose an often-overlooked performance ceiling: when dataset actions are suboptimal, indiscriminate imitation structurally prevents the actor from fully exploiting high-value r

Source

http://arxiv.org/abs/2602.07441v1