simmediumroboticsmetric · varies

DVGT-2: Vision-Geometry-Action Model for Autonomous Driving at Scale

Description

End-to-end autonomous driving has evolved from the conventional paradigm based on sparse perception into vision-language-action (VLA) models, which focus on learning language descriptions as an auxiliary task to facilitate planning. In this paper, we propose an alternative Vision-Geometry-Action (VGA) paradigm that advocates dense 3D geometry as the critical cue for autonomous driving. As vehicles operate in a 3D world, we think dense 3D geometry provides the most comprehensive information for d

Source

http://arxiv.org/abs/2604.00813v2