simmediumimitationmetric · varies

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Description

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal polic

Source

http://arxiv.org/abs/2411.09891v1