← Back to Benchmarks
simmediumrlmetric · varies
$\nabla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space
Description
Scaling inference-time compute for Large Language Models (LLMs) has unlocked unprecedented reasoning capabilities. However, existing inference-time scaling methods typically rely on inefficient and suboptimal discrete search algorithms or trial-and-error prompting to improve the online policy. In this paper, we propose $\nabla$-Reasoner, an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop to refine the policy on the fly. Our core