simmediumimitationmetric · varies

From Generated Human Videos to Physically Plausible Robot Trajectories

Description

Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner? This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to r

Source

http://arxiv.org/abs/2512.05094v2