simmediumoffline-rlmetric · varies

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents

Description

With the software industry shifting toward a data-driven culture, online A/B testing is a key tool for evaluating new technologies. However, deploying such experiments requires substantial resources, may negatively impact users, and involves long data collection periods. To address this, \textit{off-policy evaluation (OPE)}, or offline A/B testing, uses logged data to assess technologies and is fundamental in Reinforcement Learning, making it crucial in domains where online testing is costly or

Source

http://arxiv.org/abs/2511.00802v1