simmediumnavigationmetric · varies

SceneVGGT: VGGT-based online 3D semantic SLAM for indoor scene understanding and navigation

Description

We present SceneVGGT, a spatio-temporal 3D scene understanding framework that combines SLAM with semantic mapping for autonomous and assistive navigation. Built on VGGT, our method scales to long video streams via a sliding-window pipeline. We align local submaps using camera-pose transformations, enabling memory- and speed-efficient mapping while preserving geometric consistency. Semantics are lifted from 2D instance masks to 3D objects using the VGGT tracking head, maintaining temporally coher

Source

http://arxiv.org/abs/2602.15899v2