simmediumroboticsmetric · varies

Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

Description

Video generation models have advanced rapidly and are beginning to show a strong understanding of physical dynamics. In this paper, we investigate how far an advanced video generation model such as Veo-3 can support generalizable robotic manipulation. We first study a zero-shot approach in which Veo-3 predicts future image sequences from current robot observations, while an inverse dynamics model IDM recovers the corresponding robot actions. The IDM is trained solely on random-play data, requiri

Source

http://arxiv.org/abs/2604.04502v1