simmediumroboticsmetric · varies

Structured Observation Language for Efficient and Generalizable Vision-Language Navigation

Description

Vision-Language Navigation (VLN) requires an embodied agent to navigate complex environments by following natural language instructions, which typically demands tight fusion of visual and language modalities. Existing VLN methods often convert raw images into visual tokens or implicit features, requiring large-scale visual pre-training and suffering from poor generalization under environmental variations (e.g., lighting, texture). To address these issues, we propose SOL-Nav (Structured Observati

Source

http://arxiv.org/abs/2603.27577v1