simmediumnavigationmetric · varies

TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation

Description

Vision-Language Navigation (VLN) presents a unique challenge for Large Vision-Language Models (VLMs) due to their inherent architectural mismatch: VLMs are primarily pretrained on static, disembodied vision-language tasks, which fundamentally clash with the dynamic, embodied, and spatially-structured nature of navigation. Existing large-model-based methods often resort to converting rich visual and spatial information into text, forcing models to implicitly infer complex visual-topological relat

Source

http://arxiv.org/abs/2603.02972v1