← Back to Benchmarks
simmediumquadrupedmetric · varies

VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation

Description

A fundamental challenge in robot navigation lies in learning policies that generalize across diverse environments while conforming to the unique physical constraints and capabilities of a specific embodiment (e.g., quadrupeds can walk up stairs, but rovers cannot). We propose VAMOS, a hierarchical VLA that decouples semantic planning from embodiment grounding: a generalist planner learns from diverse, open-world data, while a specialist affordance model learns the robot's physical constraints an

Source

http://arxiv.org/abs/2510.20818v1