simmediumnavigationmetric · varies

MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

Description

Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations,

Source

http://arxiv.org/abs/2512.16909v2