← Back to Benchmarks
simmediumroboticsmetric · varies
EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards
Description
Video generative models are increasingly used as world models for robotics, where a model generates a future visual rollout conditioned on the current observation and task instruction, and an inverse dynamics model (IDM) converts the generated frames into executable robot actions. However, current video world models lack explicit executability constraints. As a result, visually coherent rollouts may still violate rigid-body and kinematic consistency, producing unstable or infeasible control comm