← Back to Benchmarks
simmediumroboticsmetric · varies

Language-Conditioned World Modeling for Visual Navigation

Description

We study language-conditioned visual navigation (LCVN), in which an embodied agent is asked to follow a natural language instruction based only on an initial egocentric observation. Without access to goal images, the agent must rely on language to shape its perception and continuous control, making the grounding problem particularly challenging. We formulate this problem as open-loop trajectory prediction conditioned on linguistic instructions and introduce the LCVN Dataset, a benchmark of 39,01

Source

http://arxiv.org/abs/2603.26741v1