simmediumoffline-rlmetric · varies

OffSeeker: Online Reinforcement Learning Is Not All You Need for Deep Research Agents

Description

Deep research agents have shown remarkable potential in handling long-horizon tasks. However, state-of-the-art performance typically relies on online reinforcement learning (RL), which is financially expensive due to extensive API calls. While offline training offers a more efficient alternative, its progress is hindered by the scarcity of high-quality research trajectories. In this paper, we demonstrate that expensive online reinforcement learning is not all you need to build powerful research

Source

http://arxiv.org/abs/2601.18467v2