← Back to Benchmarks
simmediumlocomotionmetric · varies
Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies
Description
Reinforcement learning (RL) policies are prone to high-frequency oscillations, especially undesirable when deploying to hardware in the real-world. In this paper, we identify, categorize, and compare methods from the literature that aim to mitigate high-frequency oscillations in deep RL. We define two broad classes: loss regularization and architectural methods. At their core, these methods incentivize learning a smooth mapping, such that nearby states in the input space produce nearby actions i