simmediumimitationmetric · varies

IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model

Description

Vision-Language-Action (VLA) models have demonstrated potential in autonomous driving. However, two critical challenges hinder their development: (1) Existing VLA architectures are typically based on imitation learning in open-loop setup which tends to capture the recorded behaviors in the dataset, leading to suboptimal and constrained performance, (2) Close-loop training relies heavily on high-fidelity sensor simulation, where domain gaps and computational inefficiencies pose significant barrie

Source

http://arxiv.org/abs/2508.06571v3