← Back to Benchmarks
simmediumoffline-rlmetric · varies
ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts
Description
Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While trajectory stitching via generative models offers a promising solution, existing augmentation methods frequently produce trajectories that are either confined to the support of the behavior policy or vio