← Back to Benchmarks
simmediummanipulation-datametric · varies
GeoLanG: Geometry-Aware Language-Guided Grasping with Unified RGB-D Multimodal Learning
Description
Language-guided grasping has emerged as a promising paradigm for enabling robots to identify and manipulate target objects through natural language instructions, yet it remains highly challenging in cluttered or occluded scenes. Existing methods often rely on multi-stage pipelines that separate object perception and grasping, which leads to limited cross-modal fusion, redundant computation, and poor generalization in cluttered, occluded, or low-texture scenes. To address these limitations, we pr