simmediumatarimetric · varies

Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization

Description

Exploration remains a key challenge in deep reinforcement learning (RL). Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep reinforcement learning, which involves online stochastic gradients and deep network function approximators, is not fully understood. In this paper we propose a new, differentiable optimistic objective that when optimized yields a policy that provably explores effic

Source

http://arxiv.org/abs/2302.09339v2