simmediumoffline-rlmetric · varies

Flow-Based Single-Step Completion for Efficient and Expressive Policy Learning

Description

Generative models such as diffusion and flow-matching offer expressive policies for offline reinforcement learning (RL) by capturing rich, multimodal action distributions, but their iterative sampling introduces high inference costs and training instability due to gradient propagation across sampling steps. We propose the Single-Step Completion Policy (SSCP), a generative policy trained with an augmented flow-matching objective to predict direct completion vectors from intermediate flow samples,

Source

http://arxiv.org/abs/2506.21427v3