simmediumroboticsmetric · varies

P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation

Description

In Vision-and-Language Navigation (VLN), an agent is required to plan a path to the target specified by the language instruction, using its visual observations. Consequently, prevailing VLN methods primarily focus on building powerful planners through visual-textual alignment. However, these approaches often bypass the imperative of comprehensive scene understanding prior to planning, leaving the agent with insufficient perception or prediction capabilities. Thus, we propose P$^{3}$Nav, a novel

Source

http://arxiv.org/abs/2603.17459v1