← Back to Benchmarks
simmediumroboticsmetric · varies
Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning
Description
Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization p