simmediummanipulationmetric · varies

Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

Description

Vision-Language-Action (VLA) models enable robots to perform manipulation tasks directly from natural language instructions and are increasingly viewed as a foundation for generalist robotic policies. However, their reliability under Out-of-Distribution (OOD) instructions remains underexplored. In this paper, we reveal a critical failure mode in which VLA policies continue executing visually plausible actions even when the language instruction contradicts the scene. We refer to this phenomenon a

Source

http://arxiv.org/abs/2603.06001v1