← Back to Benchmarks
simmediumrlmetric · varies
RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language Models
Description
Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of terminal rewards hinders fine-grained, state-level optimization. Although process reward modeling offers a promising alternative, training dedicated reward models often entails substantial computational costs and scaling difficulties. To address these challenges, we introduce RewardFlow, a lightweight