← Back to Benchmarks
simmediumpolicy-learningmetric · varies
DreamPlan: Efficient Reinforcement Fine-Tuning of Vision-Language Planners via Video World Models
Description
Robotic manipulation requires sophisticated commonsense reasoning, a capability naturally possessed by large-scale Vision-Language Models (VLMs). While VLMs show promise as zero-shot planners, their lack of grounded physical understanding often leads to compounding errors and low success rates when deployed in complex real-world environments, particularly for challenging tasks like deformable object manipulation. Although Reinforcement Learning (RL) can adapt these planners to specific task dyna