simmediumimitationmetric · varies

MePoly: Max Entropy Polynomial Policy Optimization

Description

Stochastic Optimal Control provides a unified mathematical framework for solving complex decision-making problems, encompassing paradigms such as maximum entropy reinforcement learning(RL) and imitation learning(IL). However, conventional parametric policies often struggle to represent the multi-modality of the solutions. Though diffusion-based policies are aimed at recovering the multi-modality, they lack an explicit probability density, which complicates policy-gradient optimization. To bridge

Source

http://arxiv.org/abs/2602.17832v1