simmediumoffline-rlmetric · varies

Conditional Sequence Modeling for Safe Reinforcement Learning

Description

Offline safe reinforcement learning (RL) aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints. In practice, deployment requirements often vary across scenarios, necessitating a single policy that can adapt zero-shot to different cost thresholds. However, most existing offline safe RL methods are trained under a pre-specified threshold, yielding policies with limited generalization and deployment flexibility across cost thresholds. Motivated b

Source

http://arxiv.org/abs/2602.08584v1