simmediumoffline-rlmetric · varies

Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning

Description

In environments with sparse or delayed rewards, reinforcement learning (RL) incurs high sample complexity due to the large number of interactions needed for learning. This limitation has motivated the use of large language models (LLMs) for subgoal discovery and trajectory guidance. While LLMs can support exploration, frequent reliance on LLM calls raises concerns about scalability and reliability. We address these challenges by constructing a memory graph that encodes subgoals and trajectories

Source

http://arxiv.org/abs/2602.17931v1