← Back to Benchmarks
simmediumoffline-rlmetric · varies
Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL
Description
Offline goal-conditioned reinforcement learning remains challenging for long-horizon tasks. While hierarchical approaches mitigate this issue by decomposing tasks, most existing methods rely on separate high- and low-level networks and generate only a single intermediate subgoal, making them inadequate for complex tasks that require coordinating multiple intermediate decisions. To address this limitation, we draw inspiration from the chain-of-thought paradigm and propose the Chain-of-Goals Hiera