← Back to Benchmarks
simmediumoffline-rlmetric · varies
Multi-Fidelity Hybrid Reinforcement Learning via Information Gain Maximization
Description
Optimizing a reinforcement learning (RL) policy typically requires extensive interactions with a high-fidelity simulator of the environment, which are often costly or impractical. Offline RL addresses this problem by allowing training from pre-collected data, but its effectiveness is strongly constrained by the size and quality of the dataset. Hybrid offline-online RL leverages both offline data and interactions with a single simulator of the environment. In many real-world scenarios, however, m