simmediumrlmetric · varies

Flow-based Policy With Distributional Reinforcement Learning in Trajectory Optimization

Description

Reinforcement Learning (RL) has proven highly effective in addressing complex control and decision-making tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution, which constrains the policy from capturing multimodal distributions, making it difficult to cover the full range of optimal solutions in multi-solution problems, and the return is reduced to a mean value, losing its multimodal nature and thus providing insufficient gu

Source

http://arxiv.org/abs/2604.00977v1