simmediumoffline-rlmetric · varies

Offline Reinforcement Learning with Generative Trajectory Policies

Description

Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture complex, multi-modal behaviors. However, existing methods face a stark trade-off: slow, iterative models like diffusion policies are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance. In this paper, we demonstrate that it is possible to bridge this gap. The key to moving beyond the limitatio

Source

http://arxiv.org/abs/2510.11499v1