← Back to Benchmarks
simmediumrlmetric · varies
Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration
Description
Cooperative multi-agent reinforcement learning (MARL) systems powered by large language models (LLMs) are frequently optimized via sparse terminal-only feedback. This shared signal entangles upstream decisions, obstructing accurate decision-level credit assignment. To address this trajectory-level diffusion, we introduce Contextual Counterfactual Credit Assignment (\textbf{\texttt{C3}}). Instead of distributing rewards across an entire episode, \textbf{\texttt{C3}} isolates the causal impact of