simmediumimitationmetric · varies

Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving

Description

End-to-End (E2E) solutions have emerged as a mainstream approach for autonomous driving systems, with Vision-Language-Action (VLA) models representing a new paradigm that leverages pre-trained multimodal knowledge from Vision-Language Models (VLMs) to interpret and interact with complex real-world environments. However, these methods remain constrained by the limitations of imitation learning, which struggles to inherently encode physical rules during training. Existing approaches often rely on

Source

http://arxiv.org/abs/2509.20109v1