simmediumvision-robotmetric · varies

$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Description

Recent vision-language-action (VLA) models have significantly advanced robotic manipulation by unifying perception, reasoning, and control. To achieve such integration, recent studies adopt a predictive paradigm that models future visual states or world knowledge to guide action generation. However, these models emphasize forecasting outcomes rather than reasoning about the underlying process of change, which is essential for determining how to act. To address this, we propose $Δ$VLA, a prior-gu

Source

http://arxiv.org/abs/2603.08361v1