simmediumroboticsmetric · varies

Do World Action Models Generalize Better than VLAs? A Robustness Study

Description

Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting how it will evolve in response to actions. Vision-language-action (VLA), which repurpose large-scale vision-language models for robot action generation using action experts, have achieved notable success across a variety of robotic tasks. Nevertheless, their performance remains constrained by the scope of their training data, exhibiting limited gen

Source

http://arxiv.org/abs/2603.22078v2