← Back to Benchmarks
simmediummanipulationmetric · varies

NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning

Description

Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) and video generation models can decompose tasks and imagine outcomes, they often lack the physical grounding necessary for real-world execution. We introduce NovaPlan, a hierarchical framework that unifies closed-loop VLM and video planning with geometrically grounded robot execution for zero-shot long-horizon manipulation. At the high lev

Source

http://arxiv.org/abs/2602.20119v1