simmediumrlmetric · varies

Model Predictive Control with Differentiable World Models for Offline Reinforcement Learning

Description

Offline Reinforcement Learning (RL) aims to learn optimal policies from fixed offline datasets, without further interactions with the environment. Such methods train an offline policy (or value function), and apply it at inference time without further refinement. We introduce an inference time adaptation framework inspired by model predictive control (MPC) that utilizes a pretrained policy along with a learned world model of state transitions and rewards. While existing world model and diffusion

Source

http://arxiv.org/abs/2603.22430v1