← Back to Benchmarks
simmediummanipulationmetric · varies
SaPaVe: Towards Active Perception and Manipulation in Vision-Language-Action Models for Robotics
Description
Active perception and manipulation are crucial for robots to interact with complex scenes. Existing methods struggle to unify semantic-driven active perception with robust, viewpoint-invariant execution. We propose SaPaVe, an end-to-end framework that jointly learns these capabilities in a data-efficient manner. Our approach decouples camera and manipulation actions rather than placing them in a shared action space, and follows a bottom-up training strategy: we first train semantic camera contro