simmediumnavigationmetric · varies

NavDreamer: Video Models as Zero-Shot 3D Navigators

Description

Previous Vision-Language-Action models face critical limitations in navigation: scarce, diverse data from labor-intensive collection and static representations that fail to capture temporal dynamics and physical laws. We propose NavDreamer, a video-based framework for 3D navigation that leverages generative video models as a universal interface between language instructions and navigation trajectories. Our main hypothesis is that video's ability to encode spatiotemporal information and physical

Source

http://arxiv.org/abs/2602.09765v1