simmediumoffline-rlmetric · varies

TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning

Description

In offline reinforcement learning, agents are trained using only a fixed set of stored transitions derived from a source policy. However, this requires that the dataset be labeled by a reward function. In applied settings such as video game development, the availability of the reward function is not always guaranteed. This paper proposes Trajectory-Ranked OFfline Inverse reinforcement learning (TROFI), a novel approach to effectively learn a policy offline without a pre-defined reward function.

Source

http://arxiv.org/abs/2506.22008v1