simmediumoffline-rlmetric · varies

Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning

Description

Offline reinforcement learning (RL) provides a compelling paradigm for training autonomous systems without the risks of online exploration, particularly in safety-critical domains. However, jointly achieving strong safety and performance from fixed datasets remains challenging. Existing safe offline RL methods often rely on soft constraints that allow violations, introduce excessive conservatism, or struggle to balance safety, reward optimization, and adherence to the data distribution. To addre

Source

http://arxiv.org/abs/2602.08054v1