← Back to Benchmarks
simmediumroboticsmetric · varies

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

Description

Vision-Language-Action (VLA) models have recently emerged in autonomous driving, with the promise of leveraging rich world knowledge to improve the cognitive capabilities of driving systems. However, adapting such models for driving tasks currently faces a critical dilemma between spatial perception and semantic reasoning. Consequently, existing VLA systems are forced into suboptimal compromises: directly adopting 2D Vision-Language Models yields limited spatial perception, whereas enhancing the

Source

http://arxiv.org/abs/2604.02190v1