simmediumoffline-rlmetric · varies

GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning

Description

Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy R

Source

http://arxiv.org/abs/2505.18763v4