← Back to Benchmarks
simmediumoffline-rlmetric · varies
Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning
Description
Offline reinforcement learning (RL) optimizes policies from a previously collected static dataset and is an important branch of RL. A popular and promising approach is to regularize actor-critic methods with behavior cloning (BC), which yields realistic policies and mitigates bias from out-of-distribution actions, but can impose an often-overlooked performance ceiling: when dataset actions are suboptimal, indiscriminate imitation structurally prevents the actor from fully exploiting high-value r