← Back to Benchmarks
simmediumoffline-rlmetric · varies

Scaling Open-Ended Reasoning to Predict the Future

Description

High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline ne

Source

http://arxiv.org/abs/2512.25070v2