← Back to Benchmarks
simmediumrlmetric · varies
GoldenStart: Q-Guided Priors and Entropy Control for Distilling Flow Policies
Description
Flow-matching policies hold great promise for reinforcement learning (RL) by capturing complex, multi-modal action distributions. However, their practical application is often hindered by prohibitive inference latency and ineffective online exploration. Although recent works have employed one-step distillation for fast inference, the structure of the initial noise distribution remains an overlooked factor that presents significant untapped potential. This overlooked factor, along with the challe