simmediumoffline-rlmetric · varies

Behavior-Adaptive Q-Learning: A Unifying Framework for Offline-to-Online RL

Description

Offline reinforcement learning (RL) enables training from fixed data without online interaction, but policies learned offline often struggle when deployed in dynamic environments due to distributional shift and unreliable value estimates on unseen state-action pairs. We introduce Behavior-Adaptive Q-Learning (BAQ), a framework designed to enable a smooth and reliable transition from offline to online RL. The key idea is to leverage an implicit behavioral model derived from offline data to provid

Source

http://arxiv.org/abs/2511.03695v1