simmediumoffline-rlmetric · varies

Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization

Description

Offline-to-online deployment of reinforcement-learning (RL) agents must bridge two gaps: (1) the sim-to-real gap, where real systems add latency and other imperfections not present in simulation, and (2) the interaction gap, where policies trained purely offline face out-of-distribution states during online execution because gathering new interaction data is costly or risky. Agents therefore have to generalize from static, delay-free datasets to dynamic, delay-prone environments. Standard offlin

Source

http://arxiv.org/abs/2506.00131v2