← Back to Benchmarks
simmediumquadrupedmetric · varies

Embodied Navigation Foundation Model

Description

Navigation is a fundamental capability in embodied AI, representing the intelligence required to perceive and interact within physical environments following language instructions. Despite significant progress in large Vision-Language Models (VLMs), which exhibit remarkable zero-shot performance on general vision-language tasks, their generalization ability in embodied navigation remains largely confined to narrow task settings and embodiment-specific architectures. In this work, we introduce a

Source

http://arxiv.org/abs/2509.12129v2