simmediumoffline-rlmetric · varies

Online Optimization for Offline Safe Reinforcement Learning

Description

We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical a

Source

http://arxiv.org/abs/2510.22027v1