simmediumoffline-rlmetric · varies

How to Provably Improve Return Conditioned Supervised Learning?

Description

In sequential decision-making problems, Return-Conditioned Supervised Learning (RCSL) has gained increasing recognition for its simplicity and stability in modern decision-making tasks. Unlike traditional offline reinforcement learning (RL) algorithms, RCSL frames policy learning as a supervised learning problem by taking both the state and return as input. This approach eliminates the instability often associated with temporal difference (TD) learning in offline RL. However, RCSL has been criti

Source

http://arxiv.org/abs/2506.08463v1