simmediumoffline-rlmetric · varies

Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL

Description

Offline safe reinforcement learning(OSRL) derives constraint-satisfying policies from pre-collected datasets, offers a promising avenue for deploying RL in safety-critical real-world domains such as robotics. However, the majority of existing approaches emphasize only short-term safety, neglecting long-horizon considerations. Consequently, they may violate safety constraints and fail to ensure sustained protection during online deployment. Moreover, the learned policies often struggle to handle

Source

http://arxiv.org/abs/2505.08179v2