simmediumlocomotionmetric · varies

Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies

Description

Reinforcement learning (RL) policies are prone to high-frequency oscillations, especially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high-frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions i

Source

http://arxiv.org/abs/2410.16632v1