← Back to Benchmarks
simmediumvision-robotmetric · varies

ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models

Description

A recent trend in vision-language models (VLMs) has been to enhance their spatial cognition for embodied domains. Despite progress, existing evaluations have been limited both in paradigm and in coverage, hindering rapid, iterative model development. To address these limitations, we propose ESPIRE, a diagnostic benchmark for embodied spatial reasoning. ESPIRE offers a simulated world that physically grounds VLMs and evaluates them on spatial-reasoning-centric robotic tasks, thus narrowing the ga

Source

http://arxiv.org/abs/2603.13033v1