simmediumimitationmetric · varies

Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies

Description

We study how to exploit dense simulator-defined rewards in vision-based autonomous driving without inheriting their misalignment with deployment metrics. In realistic simulators such as CARLA, privileged state (e.g., lane geometry, infractions, time-to-collision) can be converted into dense rewards that stabilize and accelerate model-based reinforcement learning, but policies trained directly on these signals often overfit and fail to generalize when evaluated on sparse objectives such as route

Source

http://arxiv.org/abs/2512.04279v2