simmediumoffline-rlmetric · varies

Enhancing Robustness of Offline Reinforcement Learning Under Data Corruption via Sharpness-Aware Minimization

Description

Offline reinforcement learning (RL) is vulnerable to real-world data corruption, with even robust algorithms failing under challenging observation and mixture corruptions. We posit this failure stems from data corruption creating sharp minima in the loss landscape, leading to poor generalization. To address this, we are the first to apply Sharpness-Aware Minimization (SAM) as a general-purpose, plug-and-play optimizer for offline RL. SAM seeks flatter minima, guiding models to more robust parame

Source

http://arxiv.org/abs/2511.17568v1