simmediumpolicy-learningmetric · varies

DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models

Description

Robotic manipulation requires sophisticated commonsense reasoning, a capability naturally possessed by large-scale Vision-Language Models (VLMs). While VLMs show promise as zero-shot planners, their lack of grounded physical understanding often leads to compounding errors and low success rates when deployed in complex real-world environments, particularly for challenging tasks like deformable object manipulation. Although Reinforcement Learning (RL) can adapt these planners to specific task dyna

Source

http://arxiv.org/abs/2603.16860v1