← Back to Benchmarks
simmediumvision-robotmetric · varies

PanoEnv: Exploring 3D Spatial Intelligence in Panoramic Environments with Reinforcement Learning

Description

360 panoramic images are increasingly used in virtual reality, autonomous driving, and robotics for holistic scene understanding. However, current Vision-Language Models (VLMs) struggle with 3D spatial reasoning on Equirectangular Projection (ERP) images due to geometric distortion and limited 3D supervision. We introduce PanoEnv, a large-scale VQA benchmark built from synthetic 3D environments, containing 14.8K questions across five categories (e.g., relative position, volume comparison) ground

Source

http://arxiv.org/abs/2602.21992v1