simmediumrlmetric · varies

SELAUR: Self Evolving LLM Agent via Uncertainty-aware Rewards

Description

Large language models (LLMs) are increasingly deployed as multi-step decision-making agents, where effective reward design is essential for guiding learning. Although recent work explores various forms of reward shaping and step-level credit assignment, a key signal remains largely overlooked: the intrinsic uncertainty of LLMs. Uncertainty reflects model confidence, reveals where exploration is needed, and offers valuable learning cues even in failed trajectories. We introduce SELAUR: Self Evolv

Source

http://arxiv.org/abs/2602.21158v2