← Back to Benchmarks
simmediumnavigationmetric · varies
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
Description
VLA models have shown promising potential in embodied navigation by unifying perception and planning while inheriting the strong generalization abilities of large VLMs. However, most existing VLA models rely on reactive mappings directly from observations to actions, lacking the explicit reasoning capabilities and persistent memory required for complex, long-horizon navigation tasks. To address these challenges, we propose VLingNav, a VLA model for embodied navigation grounded in linguistic-driv