← Back to Benchmarks
simmediummanipulation-datametric · varies

RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

Description

The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constraints, collecting large-scale real-world manipulation data remains difficult to scale across diverse environments. Recent work uses text-prompt conditioned image diffusion models to augment manipulation data by altering the backgrounds and tabletop objects in the visual observations. However, these approaches often overlook the practical

Source

http://arxiv.org/abs/2601.05241v1