← Back to Benchmarks
simmediummanipulation-datametric · varies
Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling
Description
Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action label