simmediumnavigationmetric · varies

Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

Description

Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often conflicting objectives, that can lead to unstable min/max, adversarial optimization. A promising alternative is safety reachability analysis, which precomputes a forward-invariant safe state, action set, ensuring that an agent

Source

http://arxiv.org/abs/2603.22292v2