← Back to Benchmarks
simmediumoffline-rlmetric · varies
Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL
Description
Offline safe reinforcement learning(OSRL) derives constraint-satisfying policies from pre-collected datasets, offers a promising avenue for deploying RL in safety-critical real-world domains such as robotics. However, the majority of existing approaches emphasize only short-term safety, neglecting long-horizon considerations. Consequently, they may violate safety constraints and fail to ensure sustained protection during online deployment. Moreover, the learned policies often struggle to handle