simmediumhumanoidmetric · varies

Task-Centric Policy Optimization from Misaligned Motion Priors

Description

Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of

Source

http://arxiv.org/abs/2601.19411v2