simmediumroboticsmetric · varies

Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Description

Learning diverse and high-fidelity traffic simulations from human driving demonstrations is crucial for autonomous driving evaluation. The recent next-token prediction (NTP) paradigm, widely adopted in large language models (LLMs), has been applied to traffic simulation and achieves iterative improvements via supervised fine-tuning (SFT). However, such methods limit active exploration of potentially valuable motion tokens, particularly in suboptimal regions. Entropy patterns provide a promising

Source

http://arxiv.org/abs/2603.24989v1