← Back to Benchmarks
simmediummanipulation-datametric · varies
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
Description
The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constraints, collecting large-scale real-world manipulation data remains difficult to scale across diverse environments. Recent work uses text-prompt conditioned image diffusion models to augment manipulation data by altering the backgrounds and tabletop objects in the visual observations. However, these approaches often overlook the practical