← Back to Benchmarks
simmediummanipulationmetric · varies
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Description
Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapted video models for policy learning but introduce complexity by requiring multiple stages of post-training and new architectural components for action generation. In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effecti