simmediumoffline-rlmetric · varies

Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) has achieved significant advances in domains such as robotic control, autonomous driving, and medical decision-making. Most existing methods primarily focus on training policies that maximize cumulative returns from a given dataset. However, many real-world applications require precise control over policy performance levels, rather than simply pursuing the best possible return. Reinforcement learning via supervised learning (RvS) frames offline RL as a sequenc

Source

http://arxiv.org/abs/2508.16420v2