simmediumrlmetric · varies

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Description

Real-world reinforcement learning systems must operate under distributional drift in their observation streams, yet most policy architectures implicitly assume fully observed and noise-free states. We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures that induce partial observability and representation shift. To respond to this drift, we augment PPO with temporal sequence models, including Transformers and State Space Models (SSMs), to enable poli

Source

http://arxiv.org/abs/2603.04648v2