← Back to Benchmarks
simmediumgraspingmetric · varies
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Description
Robots operating in human-centric environments require the integration of visual grounding and grasping capabilities to effectively manipulate objects based on user instructions. This work focuses on the task of referring grasp synthesis, which predicts a grasp pose for an object referred through natural language in cluttered scenes. Existing approaches often employ multi-stage pipelines that first segment the referred object and then propose a suitable grasp, and are evaluated in private datase