← Back to Benchmarks
simmediumgraspingmetric · varies

DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding

Description

Effective scene representation is critical for the visual grounding ability of representations, yet existing methods for 3D Visual Grounding are often constrained. They either only focus on geometric and visual cues, or, like traditional 3D scene graphs, lack the multi-dimensional attributes needed for complex reasoning. To bridge this gap, we introduce the Diverse Semantic Map (DSM) framework, a novel scene representation framework that enriches robust geometric models with a spectrum of VLM-de

Source

http://arxiv.org/abs/2504.08307v2