simmediumoffline-rlmetric · varies

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

Description

Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightwei

Source

http://arxiv.org/abs/2510.07257v1