simmediumroboticsmetric · varies

World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

Description

General-purpose world models promise scalable policy evaluation, optimization, and planning, yet achieving the required level of robustness remains challenging. Unlike policy learning, which primarily focuses on optimal actions, a world model must be reliable over a much broader range of suboptimal actions, which are often insufficiently covered by action-labeled interaction data. To address this challenge, we propose World Action Verifier (WAV), a framework that enables world models to identify

Source

http://arxiv.org/abs/2604.01985v1