← Back to Benchmarks
simmediumroboticsmetric · varies
Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?
Description
Video generation models have advanced rapidly and are beginning to show a strong understanding of physical dynamics. In this paper, we investigate how far an advanced video generation model such as Veo-3 can support generalizable robotic manipulation. We first study a zero-shot approach in which Veo-3 predicts future image sequences from current robot observations, while an inverse dynamics model IDM recovers the corresponding robot actions. The IDM is trained solely on random-play data, requiri