simmediumoffline-rlmetric · varies

ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts

Description

A central challenge in reinforcement learning (RL) is its dependence on extensive real-world interaction data to learn task-specific policies. While recent work demonstrates that large language models (LLMs) can mitigate this limitation by generating synthetic experience (noted as imaginary rollouts) for mastering novel tasks, progress in this emerging field is hindered due to the lack of a standard benchmark. To bridge this gap, we introduce ImagineBench, the first comprehensive benchmark for e

Source

http://arxiv.org/abs/2505.10010v1