← Back to Benchmarks
simmediumoffline-rlmetric · varies
Double Check My Desired Return: Transformer with Target Alignment for Offline Reinforcement Learning
Description
Offline reinforcement learning (RL) has achieved significant advances in domains such as robotic control, autonomous driving, and medical decision-making. Most existing methods primarily focus on training policies that maximize cumulative returns from a given dataset. However, many real-world applications require precise control over policy performance levels, rather than simply pursuing the best possible return. Reinforcement learning via supervised learning (RvS) frames offline RL as a sequenc