← Back to Benchmarks
simmediumroboticsmetric · varies

DyGeoVLN: Infusing Dynamic Geometry Foundation Model into Vision-Language Navigation

Description

Vision-language Navigation (VLN) requires an agent to understand visual observations and language instructions to navigate in unseen environments. Most existing approaches rely on static scene assumptions and struggle to generalize in dynamic, real-world scenarios. To address this challenge, we propose DyGeoVLN, a dynamic geometry-aware VLN framework. Our method infuses a dynamic geometry foundation model into the VLN framework through cross-branch feature fusion to enable explicit 3D spatial re

Source

http://arxiv.org/abs/2603.21269v1