← Back to Benchmarks
simmediumroboticsmetric · varies

VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning

Description

Imitation learning is a prominent paradigm for robotic manipulation. However, existing visual imitation methods map 2D image observations directly to 3D action outputs, imposing a 2D-3D mismatch that hinders spatial reasoning and degrades robustness. We present VolumeDP, a policy architecture that restores spatial alignment by explicitly reasoning in 3D. VolumeDP first lifts image features into a Volumetric Representation via cross-attention. It then selects task-relevant voxels with a learnable

Source

http://arxiv.org/abs/2603.17720v1