← Back to Benchmarks
simmediumoffline-rlmetric · varies
DAWM: Diffusion Action World Models for Offline Reinforcement Learning via Action-Inferred Transitions
Description
Diffusion-based world models have demonstrated strong capabilities in synthesizing realistic long-horizon trajectories for offline reinforcement learning (RL). However, many existing methods do not directly generate actions alongside states and rewards, limiting their compatibility with standard value-based offline RL algorithms that rely on one-step temporal difference (TD) learning. While prior work has explored joint modeling of states, rewards, and actions to address this issue, such formula