simmediumrlmetric · varies

Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

Description

The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards. We suggest that this approach incurs unnecessary overhead: while policy optimization is necessary for precise task execution

Source

http://arxiv.org/abs/2603.22273v3