← Back to Benchmarks
simmediummanipulationmetric · varies
NovaPlan: Zero-Shot Long-Horizon Manipulation via Closed-Loop Video Language Planning
Description
Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) and video generation models can decompose tasks and imagine outcomes, they often lack the physical grounding necessary for real-world execution. We introduce NovaPlan, a hierarchical framework that unifies closed-loop VLM and video planning with geometrically grounded robot execution for zero-shot long-horizon manipulation. At the high lev