simmediumoffline-rlmetric · varies

Dichotomous Diffusion Policy Optimization

Description

Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion policies using reinforcement learning (RL) remains challenging. Existing methods either suffer from unstable training due to directly maximizing value objectives, or face computational issues due to relying on crude Gaussian likelihood approximation, which requires

Source

http://arxiv.org/abs/2601.00898v2