← Back to Benchmarks
simmediumnavigationmetric · varies

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Description

Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning and improve exploration efficiency, their planning remains constrained by textual representations, which cannot adequately capture spatial occupancy or scene geometry--critical factors for navigation decisions. We explore whether Vision-Language Models (VLMs) ca

Source

http://arxiv.org/abs/2512.17435v2