simmediummanipulationmetric · varies

Benchmarking Affordance Generalization with BusyBox

Description

Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive performance, VLAs are increasingly able to handle commands and environments unseen in their training set. While generalization in vision and language space is undoubtedly important for robust versatile behaviors, a key meta-skill VLAs need to possess is affordance generalization -- the ability to m

Source

http://arxiv.org/abs/2602.05441v1