← Back to Benchmarks
simmediumoffline-rlmetric · varies
RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
Description
In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value or model-based pessimism, and restricted policy classes that limit policy expressiveness, whereas diffusion/flow-based expressive generative policies trained with a behavioral-cloning (BC) objecti