simmediumoffline-rlmetric · varies

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Description

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the li

Source

http://arxiv.org/abs/2510.05592v1