← Back to Benchmarks
simmediummanipulation-datametric · varies
Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection
Description
Monocular 3D object detection (M3OD) is intrinsically ill-posed, hence training a high-performance deep learning based M3OD model requires a humongous amount of labeled data with complicated visual variation from diverse scenes, variety of objects and camera poses.However, we observe that, due to strong human bias, the three independent entities, i.e., object, scene, and camera pose, are always tightly entangled when an image is captured to construct training data. More specifically, specific 3D