← Back to Benchmarks
simmediummanipulationmetric · varies

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Description

Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapted video models for policy learning but introduce complexity by requiring multiple stages of post-training and new architectural components for action generation. In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effecti

Source

http://arxiv.org/abs/2601.16163v1