simmediumnavigationmetric · varies

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation

Description

Vision-Language Models (VLMs) have shown remarkable progress in Vision-Language Navigation (VLN), offering new possibilities for navigation decision-making that could benefit both robotic platforms and human users. However, real-world navigation is inherently conditioned by the agent's mobility constraints. For example, a sweeping robot cannot traverse stairs, while a quadruped can. We introduce Capability-Conditioned Navigation (CapNav), a benchmark designed to evaluate how well VLMs can naviga

Source

http://arxiv.org/abs/2602.18424v1