← Back to Benchmarks
simmediummanipulation-datametric · varies

Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection

Description

Monocular 3D object detection (M3OD) is intrinsically ill-posed, hence training a high-performance deep learning based M3OD model requires a humongous amount of labeled data with complicated visual variation from diverse scenes, variety of objects and camera poses.However, we observe that, due to strong human bias, the three independent entities, i.e., object, scene, and camera pose, are always tightly entangled when an image is captured to construct training data. More specifically, specific 3D

Source

http://arxiv.org/abs/2602.20627v1