simmediumrlmetric · varies

RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models

Description

Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of terminal rewards hinders fine-grained, state-level optimization. Although process reward modeling offers a promising alternative, training dedicated reward models often entails substantial computational costs and scaling difficulties. To address these challenges, we introduce RewardFlow, a lightweight

Source

http://arxiv.org/abs/2603.18859v1