simmediumoffline-rlmetric · varies

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

Description

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first

Source

http://arxiv.org/abs/2505.01822v1