simmediumroboticsmetric · varies

DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment

Description

Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced a

Source

http://arxiv.org/abs/2510.17148v4