simmediumoffline-rlmetric · varies

Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization

Description

Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, m

Source

http://arxiv.org/abs/2509.14848v1