← Back to Benchmarks
simmediumroboticsmetric · varies
Drift-Based Policy Optimization: Native One-Step Policy Learning for Online Robot Control
Description
Although multi-step generative policies achieve strong performance in robotic manipulation by modeling multimodal action distributions, they require multi-step iterative denoising at inference time. Each action therefore needs tens to hundreds of network function evaluations (NFEs), making them costly for high-frequency closed-loop control and online reinforcement learning (RL). To address this limitation, we propose a two-stage framework for native one-step generative policies that shifts refin