simmediumoffline-rlmetric · varies

MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

Description

We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets with mismatched dynamics. Existing methods either penalize the reward or discard source transitions occurring in parts of the transition space with high dynamics shift. As a result, they optimize the policy using data from low-shift regions, limiting exploration of high-reward states in the target domain that do not fall within these regions. Consequently, suc

Source

http://arxiv.org/abs/2506.08460v3