simmediumnavigationmetric · varies

CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM

Description

Recent Vision-Language Models (VLMs) have demonstrated significant potential in robotic planning. However, they typically function as semantic reasoners, lacking an intrinsic understanding of the specific robot's physical capabilities. This limitation is particularly critical in interactive navigation, where robots must actively modify cluttered environments to create traversable paths. Existing VLM-based navigators are predominantly confined to passive obstacle avoidance, failing to reason abou

Source

http://arxiv.org/abs/2601.03956v1