simmediumoffline-rlmetric · varies

Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens

Description

Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At de

Source

http://arxiv.org/abs/2506.06261v1