simmediumroboticsmetric · varies

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

Description

Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the physical world. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models (GPT, Gemini, Grok, Seed, Qwen, and Intern) stand on the path toward spatial intelligence (SI). We thus propose EAS

Source

http://arxiv.org/abs/2508.13142v5