simmediumroboticsmetric · varies

CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Description

Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSM

Source

http://arxiv.org/abs/2512.10360v2