simmediumrlmetric · varies

TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

Description

Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieved strong results on open-domain question answering (QA), but training still remains a significant challenge. The optimization is often unstable due to sparse rewards and difficult credit assignments across reasoning and tool calls. To address this, we introduce Turn-Level Information Potential Reward Shaping (TIPS), a simple framework that assigns dense, turn-level rewards to each reasoning + tool-

Source

http://arxiv.org/abs/2603.22293v1