← Back to Benchmarks
simmediumoffline-rlmetric · varies

Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning

Description

Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often relying on a fixed data-mixing ratio, struggle to manage the trade-off between early learning stability and asymptotic performance. To overcome this, we introduce the Adaptive Replay Buffer (ARB), a novel approach that dynamically prioritizes data sampling based on a lightweight metric we call 'on-policyness'.

Source

http://arxiv.org/abs/2512.10510v2