← Back to Benchmarks
simmediummanipulation-datametric · varies
VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model
Description
The goal of this paper is to improve the performance and reliability of vision-language-action (VLA) models through iterative online interaction. Since collecting policy rollouts in the real world is expensive, we investigate whether a learned simulator-specifically, an action-conditioned video generation model-can be used to generate additional rollout data. Unfortunately, existing world models lack the physical fidelity necessary for policy improvement: they are predominantly trained on demons