← Back to Benchmarks
simmediumnavigationmetric · varies

SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation

Description

Recent embodied navigation approaches leveraging Vision-Language Models (VLMs) demonstrate strong generalization in versatile Vision-Language Navigation (VLN). However, reliable path planning in complex environments remains challenging due to insufficient spatial awareness. In this work, we introduce SPAN-Nav, an end-to-end foundation model designed to infuse embodied navigation with universal 3D spatial awareness using RGB video streams. SPAN-Nav extracts spatial priors across diverse scenes th

Source

http://arxiv.org/abs/2603.09163v1