← Back to Benchmarks
simmediumvision-robotmetric · varies
PanoEnv: Exploring 3D Spatial Intelligence in Panoramic Environments with Reinforcement Learning
Description
360 panoramic images are increasingly used in virtual reality, autonomous driving, and robotics for holistic scene understanding. However, current Vision-Language Models (VLMs) struggle with 3D spatial reasoning on Equirectangular Projection (ERP) images due to geometric distortion and limited 3D supervision. We introduce PanoEnv, a large-scale VQA benchmark built from synthetic 3D environments, containing 14.8K questions across five categories (e.g., relative position, volume comparison) ground