simmediumrlmetric · varies

$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space

Description

Scaling inference-time compute for Large Language Models (LLMs) has unlocked unprecedented reasoning capabilities. However, existing inference-time scaling methods typically rely on inefficient and suboptimal discrete search algorithms or trial-and-error prompting to improve the online policy. In this paper, we propose $\nabla$-Reasoner, an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop to refine the policy on the fly. Our core

Source

http://arxiv.org/abs/2603.04948v1