simmediumoffline-rlmetric · varies

Modular Diffusion Policy Training: Decoupling and Recombining Guidance and Diffusion for Offline RL

Description

Classifier free guidance has shown strong potential in diffusion-based reinforcement learning. However, existing methods rely on joint training of the guidance module and the diffusion model, which can be suboptimal during the early stages when the guidance is inaccurate and provides noisy learning signals. In offline RL, guidance depends solely on offline data: observations, actions, and rewards, and is independent of the policy module's behavior, suggesting that joint training is not required.

Source

http://arxiv.org/abs/2506.03154v1