simmediumoffline-rlmetric · varies

Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning

Description

Offline reinforcement learning often relies on behavior regularization that enforces policies to remain close to the dataset distribution. However, such approaches fail to distinguish between high-value and low-value actions in their regularization components. We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor. The actor directs the flow policy through weighted behavior cloning to focus on cloning high-value actions from the dat

Source

http://arxiv.org/abs/2512.03973v2