simmediumoffline-rlmetric · varies

Offline-to-Online Reinforcement Learning with Classifier-Free Diffusion Generation

Description

Offline-to-online Reinforcement Learning (O2O RL) aims to perform online fine-tuning on an offline pre-trained policy to minimize costly online interactions. Existing work used offline datasets to generate data that conform to the online data distribution for data augmentation. However, generated data still exhibits a gap with the online data, limiting overall performance. To address this, we propose a new data augmentation approach, Classifier-Free Diffusion Generation (CFDG). Without introduci

Source

http://arxiv.org/abs/2508.06806v1