simmediumoffline-rlmetric · varies

Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL

Description

Offline goal-conditioned reinforcement learning remains challenging for long-horizon tasks. While hierarchical approaches mitigate this issue by decomposing tasks, most existing methods rely on separate high- and low-level networks and generate only a single intermediate subgoal, making them inadequate for complex tasks that require coordinating multiple intermediate decisions. To address this limitation, we draw inspiration from the chain-of-thought paradigm and propose the Chain-of-Goals Hiera

Source

http://arxiv.org/abs/2602.03389v1