← Back to Benchmarks
simmediumlocomotionmetric · varies

DreamToNav: Generalizable Navigation for Robots via Generative Video Planning

Description

We present DreamToNav, a novel autonomous robot framework that uses generative video models to enable intuitive, human-in-the-loop control. Instead of relying on rigid waypoint navigation, users provide natural language prompts (e.g. ``Follow the person carefully''), which the system translates into executable motion. Our pipeline first employs Qwen 2.5-VL-7B-Instruct to refine vague user instructions into precise visual descriptions. These descriptions condition NVIDIA Cosmos 2.5, a state-of-th

Source

http://arxiv.org/abs/2603.06190v1