simmediumoffline-rlmetric · varies

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

Description

Offline goal-conditioned reinforcement learning (GCRL) offers a practical learning paradigm in which goal-reaching policies are trained from abundant state-action trajectory datasets without additional environment interaction. However, offline GCRL still struggles with long-horizon tasks, even with recent advances that employ hierarchical policy structures, such as HIQL. Identifying the root cause of this challenge, we observe the following insight. Firstly, performance bottlenecks mainly stem f

Source

http://arxiv.org/abs/2505.12737v2