simmediumnavigationmetric · varies

SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning

Description

Autonomous robotic systems require spatio-temporal understanding of dynamic environments to ensure reliable navigation and interaction. While Vision-Language Models (VLMs) provide open-world semantic priors, they lack grounding in 3D geometry and temporal dynamics. Conversely, geometric perception captures structure and motion but remains semantically sparse. We propose SNOW (Scene Understanding with Open-World Knowledge), a training-free and backbone-agnostic framework for unified 4D scene unde

Source

http://arxiv.org/abs/2512.16461v1