simmediumvision-robotmetric · varies

VLA-IAP: Training-Free Visual Token Pruning via Interaction Alignment for Vision-Language-Action Models

Description

Vision-Language-Action (VLA) models have rapidly advanced embodied intelligence, enabling robots to execute complex, instruction-driven tasks. However, as model capacity and visual context length grow, the inference cost of VLA systems becomes a major bottleneck for real-world deployment on resource-constrained platforms. Existing visual token pruning methods mainly rely on semantic saliency or simple temporal cues, overlooking the continuous physical interaction, a fundamental property of VLA t

Source

http://arxiv.org/abs/2603.22991v1