simmediummanipulation-datametric · varies

VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

Description

The goal of this paper is to improve the performance and reliability of vision-language-action (VLA) models through iterative online interaction. Since collecting policy rollouts in the real world is expensive, we investigate whether a learned simulator-specifically, an action-conditioned video generation model-can be used to generate additional rollout data. Unfortunately, existing world models lack the physical fidelity necessary for policy improvement: they are predominantly trained on demons

Source

http://arxiv.org/abs/2602.12063v2