← Back to Benchmarks
simmediummanipulation-datametric · varies

Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling

Description

Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action label

Source

http://arxiv.org/abs/2512.23162v4