← Back to Benchmarks
simmediumnavigationmetric · varies
History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation
Description
Vision-Language Navigation (VLN) enables robots to follow natural-language instructions in visually grounded environments, serving as a key capability for embodied robotic systems. Recent Vision-Language-Action (VLA) models have demonstrated strong navigation performance, but their high computational cost introduces latency that limits real-time deployment. We propose a training-free spatio-temporal vision token pruning framework tailored to VLA-based VLN. We apply spatial token selection to the