simmediumrlmetric · varies

GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies

Description

Flow-matching policies hold great promise for reinforcement learning (RL) by capturing complex, multi-modal action distributions. However, their practical application is often hindered by prohibitive inference latency and ineffective online exploration. Although recent works have employed one-step distillation for fast inference, the structure of the initial noise distribution remains an overlooked factor that presents significant untapped potential. This overlooked factor, along with the challe

Source

http://arxiv.org/abs/2603.14245v1