simmediumrlmetric · varies

Maximum-Entropy Exploration with Future State-Action Visitation Measures

Description

Maximum entropy reinforcement learning motivates agents to explore states and actions to maximize the entropy of some distribution, typically by providing additional intrinsic rewards proportional to that entropy function. In this paper, we study intrinsic rewards proportional to the entropy of the discounted distribution of state-action features visited during future time steps. This approach is motivated by two results. First, we show that the expected sum of these intrinsic rewards is a lower

Source

http://arxiv.org/abs/2603.18965v1