simmediumrlmetric · varies

Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Description

Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mode. While this is generally not a problem for benchmark-style evaluations that assume one correct answer, many real-world tasks inherently involve multiple valid answers or irreducible uncertainty. Examples include medical diagnosis, ambiguous question answering, and settings with incomplete inform

Source

http://arxiv.org/abs/2603.24844v1