simmediumoffline-rlmetric · varies

Evaluation-Time Policy Switching for Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) looks at learning how to optimally solve tasks using a fixed dataset of interactions from the environment. Many off-policy algorithms developed for online learning struggle in the offline setting as they tend to over-estimate the behaviour of out of distributions actions. Existing offline RL algorithms adapt off-policy algorithms, employing techniques such as constraining the policy or modifying the value function to achieve good performance on individual data

Source

http://arxiv.org/abs/2503.12222v1