simmediumoffline-rlmetric · varies

ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts

Description

Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While trajectory stitching via generative models offers a promising solution, existing augmentation methods frequently produce trajectories that are either confined to the support of the behavior policy or vio

Source

http://arxiv.org/abs/2511.23442v2