simmediumimitationmetric · varies

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Description

Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather than visual reasoning during long-horizon interactions. In this work, we propose World-Aware Planning N

Source

http://arxiv.org/abs/2506.21230v2