← Back to Benchmarks
simmediumvision-robotmetric · varies
VideoWeaver: Multimodal Multi-View Video-to-Video Transfer for Embodied Agents
Description
Recent progress in video-to-video (V2V) translation has enabled realistic resimulation of embodied AI demonstrations, a capability that allows pretrained robot policies to be transferable to new environments without additional data collection. However, prior works can only operate on a single view at a time, while embodied AI tasks are commonly captured from multiple synchronized cameras to support policy learning. Naively applying single-view models independently to each camera leads to inconsi