simmediumoffline-rlmetric · varies

Oryx: a Scalable Sequence Model for Many-Agent Coordination in Offline MARL

Description

A key challenge in offline multi-agent reinforcement learning (MARL) is achieving effective many-agent multi-step coordination in complex environments. In this work, we propose Oryx, a novel algorithm for offline cooperative MARL to directly address this challenge. Oryx adapts the recently proposed retention-based architecture Sable and combines it with a sequential form of implicit constraint Q-learning (ICQ), to develop a novel offline autoregressive policy update scheme. This allows Oryx to s

Source

http://arxiv.org/abs/2505.22151v2