← Back to Benchmarks
simmediumrlmetric · varies
Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration
Description
The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards. We suggest that this approach incurs unnecessary overhead: while policy optimization is necessary for precise task execution