← Back to Benchmarks
simmediumnavigationmetric · varies
ProFocus: Proactive Perception and Focused Reasoning in Vision-and-Language Navigation
Description
Vision-and-Language Navigation (VLN) requires agents to accurately perceive complex visual environments and reason over navigation instructions and histories. However, existing methods passively process redundant visual inputs and treat all historical contexts indiscriminately, resulting in inefficient perception and unfocused reasoning. To address these challenges, we propose \textbf{ProFocus}, a training-free progressive framework that unifies \underline{Pro}active Perception and \underline{Fo