simmediumroboticsmetric · varies

Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control

Description

Although multi-step generative policies achieve strong performance in robotic manipulation by modeling multimodal action distributions, they require multi-step iterative denoising at inference time. Each action therefore needs tens to hundreds of network function evaluations (NFEs), making them costly for high-frequency closed-loop control and online reinforcement learning (RL). To address this limitation, we propose a two-stage framework for native one-step generative policies that shifts refin

Source

http://arxiv.org/abs/2604.03540v1