simmediumoffline-rlmetric · varies

From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning

Description

Offline Reinforcement Learning (RL) aims to learn effective policies from a static dataset without requiring further agent-environment interactions. However, its practical adoption is often hindered by the need for explicit reward annotations, which can be costly to engineer or difficult to obtain retrospectively. To address this, we propose ReLOAD (Reinforcement Learning with Offline Reward Annotation via Distillation), a novel reward annotation framework for offline RL. Unlike existing methods

Source

http://arxiv.org/abs/2507.12815v2