← Back to Benchmarks
simmediumvision-robotmetric · varies

VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents

Description

Recent progress in video-to-video (V2V) translation has enabled realistic resimulation of embodied AI demonstrations, a capability that allows pretrained robot policies to be transferable to new environments without additional data collection. However, prior works can only operate on a single view at a time, while embodied AI tasks are commonly captured from multiple synchronized cameras to support policy learning. Naively applying single-view models independently to each camera leads to inconsi

Source

http://arxiv.org/abs/2603.25420v1