simmediumrlmetric · varies

Adversarial Latent-State Training for Robust Policies in Partially Observable Domains

Description

Robustness under latent distribution shift remains challenging in partially observable reinforcement learning. We formalize a focused setting where an adversary selects a hidden initial latent distribution before the episode, termed an adversarial latent-initial-state POMDP. Theoretically, we prove a latent minimax principle, characterize worst-case defender distributions, and derive approximate best-response inequalities with finite-sample concentration bounds that make the optimization and sam

Source

http://arxiv.org/abs/2603.07313v3