← Back to Benchmarks
simmediumroboticsmetric · varies

EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards

Description

Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, and an inverse dynamics model (IDM) converts the generated frames into executable robot actions. However, current video world models lack explicit executability constraints. As a result, visually coherent rollouts may still violate rigid-body and kinematic consistency, producing unstable or infeasible control comm

Source

http://arxiv.org/abs/2603.17808v2