simmediumoffline-rlmetric · varies

Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning

Description

We address the challenge of offline reinforcement learning using realistic data, specifically non-expert data collected through sub-optimal behavior policies. Under such circumstance, the learned policy must be safe enough to manage distribution shift while maintaining sufficient flexibility to deal with non-expert (bad) demonstrations from offline data.To tackle this issue, we introduce a novel method called Outcome-Driven Action Flexibility (ODAF), which seeks to reduce reliance on the empiric

Source

http://arxiv.org/abs/2504.01719v2