simmediumroboticsmetric · varies

VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models

Description

While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation. This allows us to systematically design tasks with fine-grained diffi

Source

http://arxiv.org/abs/2512.22539v1