simmediumoffline-rlmetric · varies

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Description

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning tha

Source

http://arxiv.org/abs/2506.07822v3