← Back to Benchmarks
simmediumnavigationmetric · varies

ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation

Description

Vision-and-Language Navigation (VLN) requires agents to accurately perceive complex visual environments and reason over navigation instructions and histories. However, existing methods passively process redundant visual inputs and treat all historical contexts indiscriminately, resulting in inefficient perception and unfocused reasoning. To address these challenges, we propose \textbf{ProFocus}, a training-free progressive framework that unifies \underline{Pro}active Perception and \underline{Fo

Source

http://arxiv.org/abs/2603.05530v2