simmediumoffline-rlmetric · varies

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

Description

Incorporating pre-collected offline data can substantially improve the sample efficiency of reinforcement learning (RL), but its benefits can break down when the transition dynamics in the offline dataset differ from those encountered online. Existing approaches typically mitigate this issue by penalizing or filtering offline transitions in regions with large dynamics gap. However, their dynamics-gap estimators often rely on KL divergence or mutual information, which can be ill-defined when offl

Source

http://arxiv.org/abs/2505.23062v4