← Back to Benchmarks
simmediumimitationmetric · varies

MUVLA: Learning to Explore Object Navigation via Map Understanding

Description

In this paper, we present MUVLA, a Map Understanding Vision-Language-Action model tailored for object navigation. It leverages semantic map abstractions to unify and structure historical information, encoding spatial context in a compact and consistent form. MUVLA takes the current and history observations, as well as the semantic map, as inputs and predicts the action sequence based on the description of goal object. Furthermore, it amplifies supervision through reward-guided return modeling ba

Source

http://arxiv.org/abs/2509.25966v1