simmediumoffline-rlmetric · varies

Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response

Description

Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic analysis but suffers from non-stationarity and the need to maintain diverse populations of strategies that capture non-transitive interactions. Policy Space Response Oracles (PSRO) address these issues by iteratively expanding a restricted game with approximate best responses (BRs), yet per-agent BR training makes it prohibitively expensive in many-agent or simulator-expensive settings. We introduce J

Source

http://arxiv.org/abs/2602.06599v1