← Back to Benchmarks
simmediumrlmetric · varies
Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
Description
We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for $84{,}000$ episodes achieves only $26.8\%$ win rate against a uniformly-random opponent in a symmetric zero-sum game. Through controlled ablations we identify five implementation-level failure modes -- reward-scale imbalance, missing termin