simmediumpolicy-learningmetric · varies

From Video to Control: A Survey of Learning Manipulation Interfaces from Temporal Visual Data

Description

Video is a scalable observation of physical dynamics: it captures how objects move, how contact unfolds, and how scenes evolve under interaction -- all without requiring robot action labels. Yet translating this temporal structure into reliable robotic control remains an open challenge, because video lacks action supervision and differs from robot experience in embodiment, viewpoint, and physical constraints. This survey reviews methods that exploit non-action-annotated temporal video to learn c

Source

http://arxiv.org/abs/2604.04974v1