simmediumimitationmetric · varies

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

Description

One promise that Vision-Language-Action (VLA) models hold over traditional imitation learning for robotics is to leverage the broad generalization capabilities of large Vision-Language Models (VLMs) to produce versatile, "generalist" robot policies. However, current evaluations of VLAs remain insufficient. Traditional imitation learning benchmarks are unsuitable due to the lack of language instructions. Emerging benchmarks for VLAs that incorporate language often come with limited evaluation tas

Source

http://arxiv.org/abs/2506.09930v1