← Back to Benchmarks
simmediumoffline-rlmetric · varies
OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents
Description
Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research