← Back to Benchmarks
simmediumroboticsmetric · varies

AgentVLN: Towards Agentic Vision-and-Language Navigation

Description

Vision-and-Language Navigation (VLN) requires an embodied agent to ground complex natural-language instructions into long-horizon navigation in unseen environments. While Vision-Language Models (VLMs) offer strong 2D semantic understanding, current VLN systems remain constrained by limited spatial perception, 2D-3D representation mismatch, and monocular scale ambiguity. In this paper, we propose AgentVLN, a novel and efficient embodied navigation framework that can be deployed on edge computing

Source

http://arxiv.org/abs/2603.17670v1