← Back to Benchmarks
simmediumgraspingmetric · varies

Obstruction reasoning for robotic grasping

Description

Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstruction reasoning and accessibility planning. To bridge this gap, we present UNOGrasp, a learning-based vision-language model capable of performing visually-grounded obstruction reasoni

Source

http://arxiv.org/abs/2511.23186v1