← Back to Benchmarks
simmediumatarimetric · varies

Never Worse, Mostly Better: Stable Policy Improvement in Deep Reinforcement Learning

Description

In recent years, there has been significant progress in applying deep reinforcement learning (RL) for solving challenging problems across a wide variety of domains. Nevertheless, convergence of various methods has been shown to suffer from inconsistencies, due to algorithmic instability and variance, as well as stochasticity in the benchmark environments. Particularly, despite the fact that the agent's performance may be improving on average, it may abruptly deteriorate at late stages of trainin

Source

http://arxiv.org/abs/1910.01062v3