← Back to Benchmarks
simmediumgraspingmetric · varies
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
Description
Robotic manipulation of unseen objects via natural language commands remains challenging. Language driven robotic grasping (LDRG) predicts stable grasp poses from natural language queries and RGB-D images. We propose MapleGrasp, a novel framework that leverages mask-guided feature pooling for efficient vision-language driven grasping. Our two-stage training first predicts segmentation masks from CLIP-based vision-language features. The second stage pools features within these masks to generate p