simmediumoffline-rlmetric · varies

Flattening Hierarchies with Policy Bootstrapping

Description

Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. H

Source

http://arxiv.org/abs/2505.14975v3