← Back to Benchmarks
simmediumnavigationmetric · varies

EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation

Description

Recent progress in Vision-Language-Action (VLA) models has enabled embodied agents to interpret multimodal instructions and perform complex tasks. However, existing VLAs are mostly confined to short-horizon, table-top manipulation, lacking the memory and reasoning capability required for mobile manipulation, where agents must coordinate navigation and manipulation under changing spatial contexts. In this work, we present EchoVLA, a memory-aware VLA model for mobile manipulation. EchoVLA incorpor

Source

http://arxiv.org/abs/2511.18112v2