← Back to Benchmarks
simmediumnavigationmetric · varies
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
Description
Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations,