simmediummanipulationmetric · varies

SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics

Description

Active perception and manipulation are crucial for robots to interact with complex scenes. Existing methods struggle to unify semantic-driven active perception with robust, viewpoint-invariant execution. We propose SaPaVe, an end-to-end framework that jointly learns these capabilities in a data-efficient manner. Our approach decouples camera and manipulation actions rather than placing them in a shared action space, and follows a bottom-up training strategy: we first train semantic camera contro

Source

http://arxiv.org/abs/2603.12193v1