simmediumoffline-rlmetric · varies

Offline Multi-agent Reinforcement Learning via Score Decomposition

Description

Offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts, particularly stemming from the high dimensionality of joint action spaces and the presence of out-of-distribution joint action selections. In this work, we highlight that a fundamental challenge in offline MARL arises from the multi-equilibrium nature of cooperative tasks, which induces a highly multimodal joint behavior policy space coupled with heterogeneous-quality behavior data

Source

http://arxiv.org/abs/2505.05968v2