← Back to Benchmarks
simmediumvision-robotmetric · varies

Demo-Pose: Depth-Monocular Modality Fusion For Object Pose Estimation

Description

Object pose estimation is a fundamental task in 3D vision with applications in robotics, AR/VR, and scene understanding. We address the challenge of category-level 9-DoF pose estimation (6D pose + 3Dsize) from RGB-D input, without relying on CAD models during inference. Existing depth-only methods achieve strong results but ignore semantic cues from RGB, while many RGB-D fusion models underperform due to suboptimal cross-modal fusion that fails to align semantic RGB cues with 3D geometric repres

Source

http://arxiv.org/abs/2603.27533v1