← Back to Benchmarks
simmediummobile-manipulationmetric · varies

AnywhereVLA: Language-Conditioned Exploration and Mobile Manipulation

Description

We address natural language pick-and-place in unseen, unpredictable indoor environments with AnywhereVLA, a modular framework for mobile manipulation. A user text prompt serves as an entry point and is parsed into a structured task graph that conditions classical SLAM with LiDAR and cameras, metric semantic mapping, and a task-aware frontier exploration policy. An approach planner then selects visibility and reachability aware pre grasp base poses. For interaction, a compact SmolVLA manipulation

Source

http://arxiv.org/abs/2509.21006v1